There are two similar instances when I have felt extremely frustrated in my academic career. The first was forty years ago when I presented a paper in the Royal Statistical Society in London1. I was told by the gathered, well known statisticians of that time that the reason for our conclusion that simple forecasting methods (like single exponential smoothing) were at least as accurate as statistically sophisticated ones (like ARIMA models) was that Hibon and myself did not know how to use the sophisticated methods. This lead to organizing the first M Competition that proved that our conclusion was indeed correct.
The second instance was five months ago when two colleagues and myself submitted a paper2 for publication in Neural Networks. The paper was rejected without sent to referees, and we received the following report by the Action Editor: “Based on the contents of the paper, I think it does not contain enough contribution to be sent to possible reviewers. The paper basically presents a comparison of standard models, from the so-called machine learning group, with statistical models in forecasting time series benchmarks. There are many new machine learning models that have proved to overcome the results provided by statistical models, in many competitions, using the same benchmark datasets (the bold lettering is used to highlight the reason provided). Therefore, I recommend that the paper should be rejected.”
We emailed the handling editor asking him/her to please inform us of the ML studies that have proven to be superior to statistical models. However, we received no reply, so we emailed the Editor-in-Chief of the journal asking him to inform us of the studies mentioned by the Action Editor. After numerous emails, we finally sent the following one:
“We are still waiting for your response. At least we would like to know the publications that have shown that ML forecasting methods are more accurate than the traditional statistical ones since we cannot find such studies after an exhaustive search to both Google Scholar and Scopus.
We believe we deserve an answer and we are looking forwards to your reply. Not getting such an answer to our numerous emails is “anti-scientific” and against any rules of academic behavior.
Thank you for finally responding to our emails in a period where replicability and scientific objectivity is at the forefront of academic concerns.”
We received no answer to the email above and this motivated the M4 Competition. I strongly feel that it is time to set the record straight and determine the real performance of ML methods in comparison to statistical ones. This is the purpose of my new competition. I strongly believe that scientific objectivity is of paramount importance and cannot be obstructed by vested, personal interest. It is my hope that the Editor-in-Chief of Neural Networks will ask his/her Action Editor to share with us and everyone else, the ML methods that have been found, in many competitions, to be more accurate than the statistical ones, since we have not been able to locate these when searching the literature. I am sending this blog post along with an email to the Editor-in-Chief of Neural Networks as a last resort to receive a reply to our request and tell us why he has not reply to our numerous emails. I hope for a response this time. Otherwise, I plan to report this highly non-academic behavior to an ethics authority for investigation, hoping that scientific integrity will be assured.
The above refers to the motivation behind the M4 Competition. However, in retrospect, I would like to thank Neural Networks for motivating me to start the M4 Competition that will provide the following invaluable contributions to the forecasting field.
- There will be a new database of 100,000 time series available to the field to test new methods and improve forecasting accuracy.
- The majority of the methods will be run on “full reproducibility” (as defined by Boylan et al3.), meaning their forecasts could be fully reproducible by anyone wishing to do so, including making the program used to generate them available in GitHub, so that companies and individuals can utilize them if they so wish.
- There will be ten benchmark methods, widely used, requiring minimal computing and having become the standard in the forecasting literature (their code will also be available). Most importantly, it will be easier to now judge new methods by comparing them against these benchmarks. For practical purposes there will also be a measure of the “computability” requirement of the various methods to decide on the trade offs between higher accuracy and greater computing time to run a certain method.
- Given the large number of series, methods, and forecasts there would be more than a quarter of a million data points to apply data analytics to determine the factors affecting forecasting accuracy for various situations.
- The combination of two accuracy measures will be used to judge the performance of the various methods. Some ask “why only two and not more”. One reason is we do not want to increase complexity, the other is we would expect other researchers to contribute to the debate of the most appropriate accuracy measure once the M4 data will be available and be able to arrive at some consensus of the most suitable one.
- There will be monetary Prizes for the winners and I encourage firms to contribute in order to increase their size by contacting me if they are willing to do so.
- Hopefully, there will be additional benefits from the M4 to improve forecasting accuracy and increase the relevance of the field in real life applications.
1Makridakis, S., Hibon, M., 1979, “Accuracy of Forecasting: An Empirical Investigation”, (with discussion), Journal of the Royal Statistical Society, Series A, Vol. 142, Part 2, 1979, pp. 79-145 (lead article).
2Makridakis, S. Spiliotis, E. and Assimakopoulos, V., 2017 “The Accuracy of Machine Learning (ML) Forecasting Methods versus Statistical Ones: Extending the Results of the M3-Competition”, https://www.researchgate.net/publication/320298859_The_Accuracy_of_Machine_Learning_ML_Forecasting_Methods_versus_Statistical_Ones_Extending_the_Results_of_the_M3-Competition
3Boylan, B. E. et al., 2015 “Reproducibility in Forecasting research”, International Journal of Forecasting, Vol. 31, pp. 79-90