Reviewing the Quality of “Big Data” in automatic data systems: An Example
Main Article Content
Abstract
In recent decades there has been an extraordinary growth in and acceptance of automatic data systems that collect official and popular reports of epidemic occurrence. While different systems employ one or another proprietary algorithms to collect and parse disease reports all include, at a minimum, spatial locators, the date of a report, and the number of individual cases reported. These systems have been increasingly vital in both the study of individual epidemics and the exposition of expanding epidemics in real time. To date, however, there has been little analysis of the nature and quality of the data collected in these “big-net” programs or the degree to which redundancies and uncertainties may limit their utility. Here data on the 2009 H1N1 Type-A influenza epidemic gathered by a single system, healthmap.org, is parsed to determine where problems exist and how they might be rectified.
Article Details
The Medical Research Archives grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the Medical Research Archives.
References
2. ArcGIS.com. Coronavirus COVID-19 Cases. Esri.com, 2020. https://www.arcgis.com/home/item.html?id=bbb2e4f589ba40d692fab712ae37b9ac# (Accessed July 5, 2020).
3. Balcan D., Colizzac V, Gonçalvesa B, Hu H, Ramascob J, Vespignani A. Multiscale mobility networks and the spatial spreading of infectious diseases. PNAS 2009. 106 (51): 21484–21489. http://www.pnas.org/content/106/51/21484.full.pdf. Accessed May 15, 2018.
4. Brigham H. (1832). A Treatise on Epidemic Cholera: Including an Historical Account of Its Origin and Press, to the Present Period. Hartford, CT: H. and F. J. Huntington. https://archive.org/details/treatiseonepidem00brig/page/n12 .
5. Brown JS, Freifeld CC, Reis BY, and MAND KD. Surveillance Sans Frontières: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project. PLoS Medicine 2008; 5 (7): 1019-1024. https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0050151 .
6. Chiara GC, Raffle J, Aisyah DN, Sartain F, Kozlakidis Z. Big Data Analytics, Infectious Diseases and Associated Ethical Impacts. Philos & Technol 2019; 32 (1): 69-85. https://doi.org/10.1007/s13347-017-0278-y.
7. Feldman J, Thomas-Bachli A, Forsyth J, Hasnain Z, et al. Development of a global infectious disease activity database using natural language processing, machine learning, and human expertise. Journal of the American Medical Informatics Association 2019; 36 (11), 1355–1359. doi: 10.1093/jamia/ocz112.
8. Fleming DM, Van der Velden J, Paget WJ. The evolution of influenza surveillance in Europe and prospects for the next ten years. Vaccine 2003; 21 (16): 1749-1753. https://doi.org/10.1016/S0264-410X(03)00066-5 PMID: 12686088 .
9. Frelfeld C, Brownstein J. About Healthmap. Boston: Boston Children's Hospital, 2007. https://healthmap.org/about/ .
10. Lancet. History of the rise, progress, ravages, etc. of the blue cholera of India. Lancet 1831; 17; 429: 241-284,
11. Lazaro G.L, Yourish K. 2020. See how the Coronavirus Death Toll Grew Across the U.S. New York Times (April 7), 2020. https://www.nytimes.com/interactive/2020/04/06/us/coronavirus-deaths-united-states.html (Accessed July 5, 2020).
12. Gilbert G L, Degeling C, and Johnson J. Communicable Disease Surveillance Ethics in the Age of Big Data and New Technology. Asian Bioethics Review 2019; 11: 173–187 https://doi.org/10.1007/s41649-019-00087-1
13. Heymann DL, Guenael RG. The Brown Journal of World Affairs 2004; 10 (2): 185-197. https://www.jstor.org/stable/24590530.
14. Johns Hopkins University. School of Medicine Coronavirus Centre. Baltimore, MD, 2020. https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6 .
15. Koch T. Disease Maps: Epidemics on the Ground. Chicago, IL. University of Chicago Press, 2011.
16. Koch, T. Cartographies of Disease: Maps, Mapping, and Medicine. Redlands, CA: Esri Press, 2017: Chapter 14.
17. Kraemer M, Hay SI, Pigott DM, Smith DL, et al. Progress and Challenges in Infectious Disease Cartography. Trends in Parasitology 2016; 32(1): 19-29. https://www.sciencedirect.com/science/article/abs/pii/S147149221500207X .
18. Lee A. Summer camps close after Covid-19 outbreaks among campers and staff. CNN News, 2020 (July 8). https://www.cnn.com/2020/07/08/us/missouri-arkansas-summer-camp-covid-19-trnd/index.html
19. Lee EC, Asher JM, Goldlust S, Kraemer JD, et al. Mind the scales: harnessing spatial big data for infectious disease surveillance and inference. J Infect Disease 2016; 214 (S4): S409–S413. https://arxiv.org/pdf/1605.08740.pdf.
20. Leetaru K, Schrodt P A. GDELT: Global Data on Events, Location and Tone 1979–2012. Paper presented at The International Studies Association meetings, San Francisco, 2013. http://data.gdeltproject.org/documentation/ISA.2013.GDELT.pdf.
21. Ling Yeo-Teh N, Tang B. L. An alarming retraction rate for scientific publications on Coronavirus Disease 2019 (COVID-19). Accountability in Research Policies and Quality Assurance, 2020. DOI: 10.1080/08989621.2020.1782203.
22. Mehta N, Pandit A. Concurrence of big data analytics and healthcare: A systematic review. International Journal of Medical Informatics 2018; 114: 57-65. https://www.sciencedirect.com/science/article/abs/pii/S1386505618302466 .
23. Milwaukee County. 2020. Milwaukee County Covid-19 Dashboard. https://www.arcgis.com/apps/opsdashboard/index.html#/018eedbe075046779b8062b5fe1055bf (Accessed July 5, 2020).
24. O'Shea J.. Digital Disease Detection: A Systematic REview of Event-based Internet Biosurveillance Systems. Int J. Med Informatics 2017; 101: 14-22. Doi:10.1016/j.ijmedinf.2017.01.019.
25. Polonsky JA, Baidjoe A., Kamvar ZN, Cori A., et al. Outbreak analytics: a developing data science for informing the response to emerging pathogens. Phil. Trans. R. Soc. B 2019; 374: 20180276: 1-11. http://dx.doi.org/10.1098/rstb.2018.0276.
26. Snow J. On the Mode of Communication of Cholera, Second Edition. London: Churchill, 1855.
27. Snow J. Cholera and the Water Supply of the South Districts of London in 1854. Journal of Public Health 1856; 2: 239-257.
28. U.S. Census. Quick Facts: San Francisco County. Population, 2010. https://www.census.gov/quickfacts/fact/table/sanfranciscocountycalifornia,CA,US/PST045218
29. U.S. Census Annual Estimates of the Resident Population for Incorporated Places Over 50,000, Ranked by July 1, 2012 Population: April 1, 2010 to July 1, 2012 - United States -- Places of 50,000+ Population 2012 Population Estimates., 012. https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?src=bkmk.
30. Wickham H. Tidy Data. Journal of Statistical Software . 2014; 59 (10): 1-22. https://www.jstatsoft.org/article/view/v059i10/v59i10.pdf.
31. WHO. 2019. Influenza: Flunet. Geneva: World Health Organization. https://www.who.int/influenza/gisrs_laboratory/flunet/en/ .
32. Yan SJ, Chughtai AA, Macintyre, CR. 2017. Utility and potential of rapid epidemic intelligence from internet-based sources. Int J Infect Dis 2017; 63: 77–87.