A Comparison of the Prediction Capabilities of Large Scale Time Series Algorithms

Main Article Content

Susan J. Simmons Kornelia Bastin Aric LaBarr Christopher G. Healey

Abstract

Since December 31, 2020, the world has closely monitored the progress and outcomes of the SARS-CoV-2 coronavirus (COVID). This paper focuses on two goals. First, we compare time series algorithms for predicting fatalities during the COVID pandemic. Second, we examine how domain affects algorithm choice by comparing our COVID results to historical and current weekly temperature data analyses. Critical interest revolves around tracking and predicting the effects of COVID. Throughout the past three years, many researchers have created models and built visualizations to observe this disease’s progression and impact, both regionally and worldwide. Researchers have recently proposed using machine learning to forecast the progression of COVID. With the increased interest in time series methods and the different algorithms available, this paper explores these techniques’ accuracy and computational expense. We compare time series analysis approaches for forecasting COVID fatalities from March 11, 2020, to December 28, 2021. The time series models we include are those that can be automatically created to scale to large datasets. Statistical analysis is used to identify significant differences in performance. To investigate generalizability, we apply the same algorithms to predict temperature data, a standard example dataset due to its seasonal and trend components. An analysis is performed both for historical data (1970s) and current data (2020s). Results allow us to: (1) identify significant differences in algorithm performance versus pandemic data with different time series patterns; (2) examine the performance of time series algorithms trained on shorter, constant-length training sets; and (3) determine whether variations in temperature due to climate changes affect how temperature data should now be predicted. We conclude by discussing how domain and data patterns inform the decision of which time series algorithms to consider when predicting future events from historical or existing data. Our results illustrate that no one method is always the best. Careful consideration of the data’s domain, the time period in question, and the length of time to analyze must be considered when deciding which algorithm to choose.

Keywords: COVID, time series, temperature

Article Details

How to Cite
SIMMONS, Susan J. et al. A Comparison of the Prediction Capabilities of Large Scale Time Series Algorithms. Medical Research Archives, [S.l.], v. 11, n. 2, feb. 2023. ISSN 2375-1924. Available at: <https://esmed.org/MRA/mra/article/view/3620>. Date accessed: 16 apr. 2024. doi: https://doi.org/10.18103/mra.v11i2.3620.
Section
Research Articles

References

1. WHO timeline—COVID-19. World Health Organization. 2020. Accessed January 21, 2023. https://www.who.int/news-room/detail-27-04-2020-who-timeline--covid-19/.
2. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. The Lancet 2020;20:533–534.
3. Delphi Epidata API. Farrow, DC, Brooks LC, Rumack A, Tibshirani RJ, Rosenfeld R. 2015. Accessed January 21, 2023. https://github.com/cmu-delphi/delphi-epidata.
4. Wang Q, Xie S, Wang Y, Zeng D. Survival-convolution models for predicting COVID-19 cases and assessing effects of mitigation strategies. Frontiers in Public Health 2020;8:325.
5. The United States COVID-19 forecast hub dataset. Cramer EY, Huang Y, Wang Y, Ray EL, Cornell M, Bracher J, Brennen A, Castro-Rivadeneira AJ, Gerding A, House K, Jayawardena D, Kanji AH, Khandelwal, A, Le K, Niemi J, Stark A, Shah A, Wattanachit N, Zorn MW, Reich NG. 2021. Accessed January 21, 2023. https://www.medrxiv.org/content.10.1101/2021.11.04.21265886v1.
6. Blackwood JC, Childs LM. An introduction to compartmental modeling for the budding infectious disease modeler. Letters in Biomathematics 2018;5:195–221.
7. Smith D, Moore, L. The SIR model for spread of disease—The differential equation model. Convergence 2004.
8. Hunter E, Namee B, Kelleher J. An open-data-driven agent-based model to simulate infectious disease outbreaks. PLoS One 2018;13(12):e0208775.
9. Truszkowska A, Behring B, Hasanyan J, Zino L, Butail S, Caroppo E, Jiang ZP, Rizzo A, Porfiri M. High-resolution agent-based modeling of COVID-19 spreading in a small town. Advanced Theory and Simulations 2021;4(3):2000277.
10. Le M, Ibrahim M, Sagun L, Lacroix T, Nickel M. Neural relational autoregression for high-resolution COVID-19 forecasting. Facebook; 2021. Technical Report.
11. Makridakis S. A survey of time series. International Statistical Review/Revue Internationale de Statistique 1976;44:29–70.
12. Mojjada RK., Yadav A, Prabhu A, Natarajan Y. Machine learning models for COVID-19 future forecasting. Materials Today: Proceedings 2020.
13. Ahmed N, Atiya A, Gayar N, El-Shishiny H. An empirical comparison of machine learning models for time series forecasting. Econometric Reviews 2010;29(5-6):594–621.
14. Hendikawati P. A survey of time series forecasting from stochastic method to soft computing. Journal of Physics: Conference Series 2020;16(1):012019.
15. Mahalakshmi G, Sridevi S, Rajaram S. A survey on forecasting of time series data. International Conference on Computing Technologies and Intelligent Data Engineering (ICCTIDE ‘16). January 2016;1–8. Piscataway, NJ.
16. Kutner M, Nachtsheim C, Neter J, Li W. Applied Linear Statistical Models, 5th Edition. McGraw-Hill/Irwin; 2004.
17. Montgomery DC, Jennings CL, Kulahci M. Introduction to Time Series Analysis and Forecasting (Wiley Series in Probability and Statistics) 2nd Edition. Wiley-Interscience; 2015.
18. Vapnik V. The Nature of Statistical Learning Theory. Springer; 1995.
19. Holt C. Forecasting trends and seasonal by exponentially weighted averages. ONR Memorandum 1957;52(52):5–10.
20. Winters PR. Forecasting sales by exponentially weighted moving averages. Management Science 1960;6(3):324–342.
21. Hyndman R, Athanasopoulos G. Forecasting: Principles and Practice, 2nd Edition. O Texts; 2018.
22. Box G, Jenkins G. Time Series Analysis Forecasting and Control. Wiley, 1970.
23. Dickey DA, Fuller WA. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 1979;74(366a):427–431.
24. Phillips PCB, Perron P. Testing for a unit root in time series regression. Biometrika 1988;75(2):335–346.
25. Freund Y, Schapire R. Experiments with a new boosting algorithm. Proceedings of the 13th International Conference on Machine Learning (ICML ’96). July 1996;148–156. Bari, Italy.
26. Dinov, ID. Decision tree divide and conquer classification. In: Dinov ID, ed. Data Science and Predictive Analytics. Springer Professional; 2018:307–343.
27. Breiman L. Random forests. Machine Learning 2001;45:5-32.
28. Chen T, Guestrin C. XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016). August 2016;785–794. San Francisco, CA.
29. Wikle CK, Datta A, Hari BV, Boone EL, Sahoo I, Kavila I, Castruccio S, Simmons SJ, Burr WS, Chang W. An overview of model agnostic explainability methods for machine learning applied to environmental data. Environmetrics 2023;34(1):e2772.
30. Harvey, AC. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, 1989.
31. Durbin J, Koopman SJ. A simple and efficient simulation smoother for state space time series analysis. Biometrika 2002;89(3):603–616.
32. Scott SL., Varian H. Predicting the present with Bayesian structural time series. International Journal of Mathematical Modelling and Numerical Optimisation 2014;5(1-2):4–23.
33. Taylor SJ, Letham B. Forecasting at scale. The American Statistician 2018;72(1):37–45.
34. Harvey AC, Shephard N. 10 structural time series models. In: Hrishikesh DV, ed. Handbook of Statistics 11: Econometrics, 1st Edition. Elsevier; 1993:261–302.
35. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436–444.
36. McCulloch WS, Pitts, W. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics 1943;5:115–133.
37. Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 1958;65(6):386.
38. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial networks. Communications of the ACM 2020;63(11):139–144.
39. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. 2017. Attention is all you need. Advances in Neural Information Processing Systems (NIPS 2017) 2017;30.
40. COVID-19 content portal. John Hopkins University Center for System Science Engineering. 2022. Accessed September 6, 2022. https://systems.jhu.edu/research/public-health/ncov/.