[IJF special section: Epidemics and forecasting with focus on COVID-19]
Jennifer L. Castle, Jurgen A. Doornik and David F. Hendry
COVID-19 currently dominates the physical and economic well-being of individuals and societies globally. We have published real-time short-term forecasts for confirmed cases and deaths for many parts of the world on an almost daily basis from 20 March onwards on www.doornik.com/COVID-19. These have largely been reliable indicators of what can be expected to happen in the next week.
Models based on well-established theoretical understanding supported by available evidence are crucial to viable policy-making in observational-data disciplines like economics, other social sciences and epidemiology. Despite that, economics has witnessed a long history of relatively simple data-based devices `out-forecasting’ formal `structural’ models. This phenomenon occurs in other subjects. The reason is that shifts in the distributions of variables from their past behaviour lead to systematic mis-forecasting in all models in the equilibrium-correction class. That class comprises most widely-used models, from regressions, scalar and vector autoregressions, through cointegrated systems, to volatility models like ARCH and GARCH.
Models of pandemics such as the present coronavirus face this problem of non-stationarity. Epidemiological models have a sound theoretical basis and a history of useful applications. Nevertheless, novel viruses may behave in different ways from what models assume, and policy reactions to early predictions of mass deaths (e.g., by mandatory lockdowns) can shift distributions suddenly in ways that can be difficult to model formally. Added to their slow starts, then exponential increases gradually slowing, pandemic data are highly non-stationary. Indeed, the methodologies used for reporting the pandemic data are also non-stationary, with stochastic trends, such as the ramping up of testing, and distributional shifts, such as the sudden inclusion of care home cases. Thus, there is a compounding effect as the non-stationarity of the underlying data interacts with the non-stationarity of the reporting process.
Viable forecasting models must be able to handle this quadruple non-stationarity: two forms (stochastic trends and shifts) from two sources (outcomes and measurements thereof). Epidemiological models can be too highly driven by their assumptions, which combined with their assumed mathematical processes, can limit their usefulness in forecasting as they are not empirical enough. As a consequence, there is an important role in short-term forecasting after distributional shifts for adaptive data-based models using a class we call `robust’, namely devices that avoid systematic forecast failure after sudden distributional shifts. However, when forecasting that adaptability must remain firmly controlled to avoid excess volatility. An additional use of such models is that a noticeable drop in outcomes relative to baseline extrapolations can be an indication that policies are having a positive impact.
Our forecasts are derived from such `robust’ models. Our aim was to provide short-term forecasts of the numbers of confirmed cases and of deaths attributed to COVID-19 as a guide to planning for the next few days, so that the next reported numbers would not be a surprise. The basis of these reported figures differs across countries and time: some countries just use hospital counts, some add in other locations like care homes, some do much more infection testing whereas others only record cases serious enough to need medical intervention. Occasionally, the basis is switched. Thus, our forecasts are of the next reports for the recording system then in use. Robustness after changes in the reporting basis is also an important attribute of our methods: the next forecast will obviously be incorrect, but updating the later forecasts will move them quickly back on track.
The methodology to construct the short-term forecasts involves several steps. The observed daily time series is first decomposed into a trend and a remainder. The trend is estimated by taking moving windows of the data and saturating these by linear trends. Selection from these trends is made by an econometric machine-learning algorithm, and the selected linear trends are then averaged to give the overall flexible trend. Next, the trend and remainder terms are forecast separately using the Cardt method and recombined in a final forecast. Cardt is an improved version of the method that we used in the M4 competition (International Journal of Forecasting, 2020). A second averaged forecast is also reported, where many forecast paths are averaged over because countries differ substantially in their degrees of variation.
The following graph, taken from www.doornik.com/COVID-19 on 2020-04-21 (slightly edited for clarity here), shows the number of deaths from COVID-19 for the World from our data sources listed in the paper (the thick grey line with dots). The estimated trend (blue dotted line) and forecasts (red solid line) with 80% confidence intervals are plotted, along with the average forecast up to three days back given in black. The teal line plots the scenario forecast for the World with forecast intervals, explained in the paragraph below. Further information and references can be found on our COVID-19 web site.
Estimates of the peak in daily counts are made from an averaged smoothed trend constructed for cumulative counts while creating the many forecasts. To allow for a genuine slowdown in new cases and deaths, we create four scenarios based on the Chinese experience earlier in the year. Automatic model selection is then used to select the closest scenario mix from a flexible lag length, so initial models have more variables than observations. These scenario forecasts are introduced when a slowdown is detected in order to have sufficient data on the paths to select the scenarios. We expect the scenario forecasts to become more reliable the further along a country is in recovery. This approach allows us to introduce stylized facts from the Chinese experience. The empirical distribution of daily confirmed counts earlier this year in China was highly skewed, with about 2/3 of the mass beyond the peak. Daily deaths were less skewed, but closer to a straight trend up, followed by a straight trend down.
A table on the webpage records information about the peak increase in the estimated trend for deaths, including when it occurred and the number of days elapsed since the peak. A growing number of countries entering that table provides hope that the dramatic economic costs of major lockdowns have been worthwhile, and could start ending.
The webpage also provides an initial visual evaluation of our forecasts. The following graph considers confirmed cases for the US, taken from www.doornik.com/COVID-19 on 2020-04-21. Observed values are given by the thick grey line with dots. The solid red and black lines are the out-of-sample forecasts and average forecasts that were made (for the former with confidence bands). In this case, the forecasts have given good indications of what was about to happen in the next week, although going up too steeply for the last few observations.