A scoping review of AI/ML algorithm updating practices for model continuity and patient safety using a simplified checklist

Main Article Content

Ahmed Umar Otokiti, MD, MPH, MBA Huan-ju Shih, PhD(c) Makuochukwu Maryann Ozoude, MD Ilse Siguachi, MS Leyla B. Warsame, MD Karmen S. Williams, DrPH, MBA Seyi John Akinloye, BDS

Abstract

Objective: To evaluate the extent to which clinical artificial intelligence (AI) and machine learning (ML) models prioritize updating, transparency, and demographic reporting in the published literature.


Patients and Methods: This study conducted a systematic review of clinical AI/ML models using PRISMA guidelines from March 2020 until December 2021. A new checklist and scoring system were introduced to assess model quality, with additional evaluation of demographic reporting, particularly by ethnicity and race. A comprehensive search was performed across six major databases, including Ovid Embase, MEDLINE, and Cochrane Library. Across various study designs, eligible studies included human-based predictive or prognostic AI/ML models using supervised learning and at least two predictors. Studies not meeting these criteria were excluded.


Results: Out of 390 AI/ML studies reviewed, only 9% mentioned plans or methods for future model updates. The vast majority (98%) of models were still in the research phase, and only 2% had reached production. Additionally, only 12% adhered to best practices in model development, and 84% failed to report demographic composition by race or ethnicity.


Conclusion: These findings highlight key limitations in the current clinical AI landscape—especially a lack of transparency, limited readiness for deployment, and minimal consideration for inclusivity or generalizability. Greater focus on model updating, adherence to development standards, and demographic transparency is essential to improve the safety, reliability, and equity of clinical AI/ML models.

Keywords: Artificial intelligence, machine learning, model updating

Article Details

How to Cite
OTOKITI, Ahmed Umar et al. A scoping review of AI/ML algorithm updating practices for model continuity and patient safety using a simplified checklist. Medical Research Archives, [S.l.], v. 13, n. 12, jan. 2026. ISSN 2375-1924. Available at: <https://esmed.org/MRA/mra/article/view/7083>. Date accessed: 21 jan. 2026. doi: https://doi.org/10.18103/mra.v13i12.7083.
Section
Research Articles

References

1. Matheny ME, Whicher D, Thadaney Israni S. Artificial Intelligence in Health Care: A Report From the National Academy of Medicine. JAMA. 2020; 323(6):509-510. doi:10.1001/jama.2019.21579

2. Bi Q, Goodman KE, Kaminsky J, Lessler J. What is Machine Learning? A Primer for the Epidemiologist. Am J Epidemiol. 2019;188(12):2222-2239. doi:10.10 93/aje/kwz189

3. Navarro CLA, Damen JAAG, Takada T, et al. Protocol for a systematic review on the methodological and reporting quality of prediction model studies using machine learning techniques. BMJ Open. 2020;10(11):e038832. doi:10.1136/bmjopen-2020-038832

4. Sidey-Gibbons JAM, Sidey-Gibbons CJ. Machine learning in medicine: a practical introduction. BMC Med Res Methodol. 2019;19:64. doi:10.1186/s12874-019-0681-4

5. Brnabic A, Hess LM. Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making. BMC Med Inform Decis Mak. 2021;21(1):54. doi:10.118 6/s12911-021-01403-2

6. Van Calster B, McLernon DJ, van Smeden M, et al. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17(1):230. doi:10.1186/s12916-019-1466-7

7. Koola JD, Ho SB, Cao A, et al. Predicting 30 Day Hospital Readmission Risk in a National Cohort of Patients with Cirrhosis. Dig Dis Sci. 2020;65(4): 1003-1031. doi:10.1007/s10620-019-05826-w

8. Moons KGM, Kengne AP, Grobbee DE, et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart. 2012;98 (9):691-698. doi:10.1136/heartjnl-2011-301247

9. Janssen KJM, Moons KGM, Kalkman CJ, Grobbee DE, Vergouwe Y. Updating methods improved the performance of a clinical prediction model in new patients. J Clin Epidemiol. 2008;61 (1):76-86. doi:10.1016/j.jclinepi.2007.04.018

10. Steyerberg EW, Borsboom GJJM, van Houwelingen HC, Eijkemans MJC, Habbema JDF. Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Stat Med. 2004;23(16):2567-2586. doi:10.1002/sim.1844

11. Counsell C, Dennis M. Systematic Review of Prognostic Models in Patients with Acute Stroke. Cerebrovasc Dis. 2001;12(3):159-170. doi:10.115 9/000047699

12. Prediction Models for Prolonged Intensive Care Unit Stay After Cardiac Surgery | Circulation. Accessed August 31, 2025.
https://www.ahajournals.org/doi/10.1161/CIRCULATIONAHA.109.926808?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed

13. Perel P, Edwards P, Wentz R, Roberts I. Systematic review of prognostic models in traumatic brain injury. BMC Med Inform Decis Mak. 2006;6:38. doi:10.1186/1472-6947-6-38

14. Phung MT, Tin Tin S, Elwood JM. Prognostic models for breast cancer: a systematic review. BMC Cancer. 2019;19:230. doi:10.1186/s12885-019-5442-6

15. Saria S, Subbaswamy A. Tutorial: Safe and Reliable Machine Learning. arXiv. Preprint posted online April 15, 2019. doi:10.48550/arXiv.1904.07204

16. Debray TPA, Vergouwe Y, Koffijberg H, Nieboer D, Steyerberg EW, Moons KGM. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol. 2015;68(3):279-289. doi:10.1016/j.jclinepi.2014.06.018

17. Kappen TH, Vergouwe Y, van Klei WA, van Wolfswinkel L, Kalkman CJ, Moons KGM. Adaptation of Clinical Prediction Models for Application in Local Settings. Med Decis Making. 2012;32(3):E1-E10. doi:10.1177/0272989X12439755

18. Schulam P, Saria S. Can You Trust This Prediction? Auditing Pointwise Reliability After Learning. In: Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics. PMLR; 2019:1022-1031. Accessed August 31, 2025. https://proceedings.mlr.press/v89/schulam19a.html

19. Shah ND, Steyerberg EW, Kent DM. Big Data and Predictive Analytics: Recalibrating Expectations. JAMA. 2018;320(1):27-28. doi:10.1001/jama.201 8.5602

20. Toll DB, Janssen KJM, Vergouwe Y, Moons KGM. Validation, updating and impact of clinical prediction rules: a review. J Clin Epidemiol. 2008; 61(11):1085-1094. doi:10.1016/j.jclinepi.2008.04.008

21. Davis SE, Greevy RA Jr, Fonnesbeck C, Lasko TA, Walsh CG, Matheny ME. A nonparametric updating method to correct clinical prediction model drift. J Am Med Inform Assoc. 2019;26(12):1448-1457. doi:10.1093/jamia/ocz127

22. Lipsitz LA. Understanding Health Care as a Complex System: The Foundation for Unintended Consequences. JAMA. 2012;308(3):243-244. doi:10.1001/jama.2012.7551

23. Schulam P, Saria S. Reliable decision support using counterfactual models. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Curran Associates Inc.; 2017:1696-1706.

24. Hall MA, Lord R. Obamacare: what the Affordable Care Act means for patients and physicians. BMJ. 2014;349:g5376. doi:10.1136/bmj.g5376

25. Kaushal A, Altman R, Langlotz C. Geographic Distribution of US Cohorts Used to Train Deep Learning Algorithms. JAMA. 2020;324(12):1212-1213. doi:10.1001/jama.2020.12067

26. Ghassemi M, Naumann T, Schulam P, Beam AL, Chen IY, Ranganath R. A Review of Challenges and Opportunities in Machine Learning for Health. AMIA Summits Transl Sci Proc. 2020;2020:191-200.

27. Constructing the world: Active causal learning in cognition | Bramley Computational Cognitive Science Lab. Accessed August 31, 2025.
https://www.bramleylab.ppls.ed.ac.uk/publication/2017-01-01_bramley2017phdthesis/

28. How ChatGPT and our foundation models are developed. OpenAI Help Center. Accessed August 31, 2025.
https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-foundation-models-are-developed

29. The Internet May Be Too Small for the AI Boom, Researchers Say - The Wall Street Journal Google Your News Update - WSJ Podcasts. The Wall Street Journal. Accessed August 31, 2025.
https://www.wsj.com/podcasts/google-news-update/the-internet-may-be-too-small-for-the-ai-boom-researchers-say/a424f137-a5a4-46b7-b746-c7fd3d0a483d

30. Yu P, Xu H, Hu X, Deng C. Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare Integration. Healthcare. 2023;11(20):2776. doi:10.3390/healthcare11202776

31. Busch F, Hoffmann L, Rueger C, et al. Current applications and challenges in large language models for patient care: a systematic review. Commun Med. 2025;5:26. doi:10.1038/s43856-024-00717-2

32. Kwong JCC, Wang SCY, Nickel GC, Cacciamani GE, Kvedar JC. The long but necessary road to responsible use of large language models in healthcare research. Npj Digit Med. 2024;7(1):177. doi:10.1038/s41746-024-01180-y

33. Towards Urban Planing AI Agent in the Age of Agentic AI. Accessed August 31, 2025.
https://arxiv.org/html/2507.14730

34. White J. Building Living Software Systems with Generative & Agentic AI. arXiv. Preprint posted online August 3, 2024. doi:10.48550/arXiv.240 8.01768

35. Guajardo JA, Weber R, Miranda J. A model updating strategy for predicting time series with seasonal patterns. Appl Soft Comput. 2010;10(1): 276-283. doi:10.1016/j.asoc.2009.07.005

36. Davis SE, Greevy RA, Lasko TA, Walsh CG, Matheny ME. Detection of calibration drift in clinical prediction models to inform model updating. J Biomed Inform. 2020;112:103611. doi:10.1016/j.jb i.2020.103611

37. Singh A, Pandey N, Shirgaonkar A, Manoj P, Aski V. A Study of Optimizations for Fine-tuning Large Language Models. arXiv. Preprint posted online June 6, 2024. doi:10.48550/arXiv.2406.02290

38. Gao Y, Xiong Y, Gao X, et al. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv. Preprint posted online March 27, 2024. doi:10.48550/arXiv.2312.10997

39. Lewis P, Perez E, Piktus A, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In: Advances in Neural Information Processing Systems. Vol 33. Curran Associates, Inc.; 2020:9459-9474. Accessed August 31, 2025. https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html

40. Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback. arXiv. Preprint posted online March 4, 2022. doi:10.48550/arXiv.2203.02155

41. González Barman K, Lohse S, de Regt HW. Reinforcement Learning from Human Feedback in LLMs: Whose Culture, Whose Values, Whose Perspectives? Philos Technol. 2025;38(2):35. doi:10.1007/s13347-025-00861-0

42. van Stein N, Vermetten D, V. Kononova A, Bäck T. Explainable Benchmarking for Iterative Optimization Heuristics. ACM Trans Evol Learn Optim. 2025;5(2): 13:1-13:30. doi:10.1145/3716638

43. Mitrevski A, Plöger PG, Lakemeyer G. Representation and Experience-Based Learning of Explainable Models for Robot Action Execution. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2020:5641-5647. doi:10.1109/IROS45743.2020.9341470

44. de Hond AAH, Leeuwenberg AM, Hooft L, et al. Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review. Npj Digit Med. 2022;5(1):2. doi:10.1038/s 41746-021-00549-7

45. Shamseer L, Moher D, Clarke M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015;349:g7647. doi:10.1136/b mj.g7647

46. Otokiti AU, Ozoude MM, Williams KS, et al. The Need to Prioritize Model-Updating Processes in Clinical Artificial Intelligence (AI) Models: Protocol for a Scoping Review. JMIR Res Protoc. 2023;12(1): e37685. doi:10.2196/37685

47. Bell SK, Delbanco T, Elmore JG, et al. Frequency and Types of Patient-Reported Errors in Electronic Health Record Ambulatory Care Notes. JAMA Netw Open. 2020;3(6):e205867. doi:10.1001/jama networkopen.2020.5867

48. Diaz-Garelli JF, Strowd R, Wells BJ, Ahmed T, Merrill R, Topaloglu U. Lost in Translation: Diagnosis Records Show More Inaccuracies After Biopsy in Oncology Care EHRs. AMIA Summits Transl Sci Proc. 2019;2019:325-334.

49. Tse J, You W. How Accurate is the Electronic Health Record? – A Pilot Study Evaluating Information Accuracy in a Primary Care Setting. In: Health Informatics: The Transformative Power of Innovation. IOS Press; 2011:158-164. doi:10.3233/978-1-60750-791-8-158

50. Zou J, Schiebinger L. Ensuring that biomedical AI benefits diverse populations. eBioMedicine. 2021;67. doi:10.1016/j.ebiom.2021.103358

51. Moons KGM, de Groot JAH, Bouwmeester W, et al. Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist. PLoS Med. 2014;11(10): e1001744. doi:10.1371/journal.pmed.1001744

52. Kim AA, Rachid Zaim S, Subbian V. Assessing reproducibility and veracity across machine learning techniques in biomedicine: A case study using TCGA data. Int J Med Inf. 2020;141:104148. doi:10.101 6/j.ijmedinf.2020.104148

53. Li J, Liu L, Le TD, Liu J. Accurate data-driven prediction does not mean high reproducibility. Nat Mach Intell. 2020;2(1):13-15. doi:10.1038/s42256-019-0140-2

54. Stevens LM, Mortazavi BJ, Deo RC, Curtis L, Kao DP. Recommendations for Reporting Machine Learning Analyses in Clinical Research. Circ Cardiovasc Qual Outcomes. 2020;13(10):e006556. doi:10.1161/CIRCOUTCOMES.120.006556

55. Bouwmeester W, Zuithoff NPA, Mallett S, et al. Reporting and Methods in Clinical Prediction Research: A Systematic Review. PLoS Med. 2012;9(5):e100 1221. doi:10.1371/journal.pmed.1001221

56. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med. 2015;13(1):1. doi:10.1186/s12916-014-0241-z

57. Rivera SC, Liu X, Chan AW, Denniston AK, Calvert MJ. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI Extension. BMJ. 2020;370:m3210. doi:10.1136/bmj.m3210

58. Liu X, Faes L, Kale AU, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1(6):e271-e297. doi:10.101 6/S2589-7500(19)30123-2

59. Collins GS, Dhiman P, Navarro CLA, et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021;11(7):e048008. doi:10.1136/bmjopen-2020-048008

60. Altman DG, McShane LM, Sauerbrei W, Taube SE. Reporting recommendations for tumor marker prognostic studies (REMARK): explanation and elaboration. BMC Med. 2012;10:51. doi:10.1186/1 741-7015-10-51

61. Rivera SC, Liu X, Chan AW, Denniston AK, Calvert MJ. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Lancet Digit Health. 2020;2 (10):e549-e560. doi:10.1016/S2589-7500(20)30219-3

62. Janssens ACJW, Ioannidis JPA, Bedrosian S, et al. Strengthening the reporting of genetic risk prediction studies (GRIPS): explanation and elaboration. Eur J Epidemiol. 2011;26(4):313-337. doi:10.1007/s10654-011-9551-z

63. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c869. doi:10.1136/bmj.c869

64. Steyerberg EW, Moons KGM, Windt DA van der, et al. Prognosis Research Strategy (PROGRESS) 3: Prognostic Model Research. PLOS Med. 2013;10 (2):e1001381. doi:10.1371/journal.pmed.1001381

65. Mallett S, Royston P, Dutton S, Waters R, Altman DG. Reporting methods in studies developing prognostic models in cancer: a review. BMC Med. 2010;8(1):20. doi:10.1186/1741-7015-8-20

66. Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med. 2000;19 (4):453-473. doi:10.1002/(sici)1097-0258(2000022 9)19:4<453::aid-sim350>3.0.co;2-5

67. Dieren S van, Beulens JWJ, Kengne AP, et al. Prediction models for the risk of cardiovascular disease in patients with type 2 diabetes: a systematic review. Heart. 2012;98(5):360-369. doi:10.1136/heartjnl-2011-300734

68. Reilly BM, Evans AT. Translating Clinical Research into Clinical Practice: Impact of Using Prediction Rules To Make Decisions. Ann Intern Med. 2006; 144(3):201-209. doi:10.7326/0003-4819-144-3-200 602070-00009

69. Bossuyt PM, Reitsma JB, Bruns DE, et al. The STARD Statement for Reporting Studies of Diagnostic Accuracy: Explanation and Elaboration. Ann Intern Med. 2003;138(1):W1-12. doi:10.7326/0003-4819-138-1-200301070-00012-w1

70. Liberati A, Altman DG, Tetzlaff J, et al. The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies That Evaluate Health Care Interventions: Explanation and Elaboration. PLOS Med. 2009;6(7):e1000100. doi:10.1371/jou rnal.pmed.1000100

71. Artificial Intelligence in Healthcare. O’Reilly Online Learning. Accessed August 31, 2025. https://www.oreilly.com/library/view/artificial-intelligence-in/9780128184394/

72. Lu C, Ahmed SR, Singh P, Kalpathy-Cramer J. Estimating Test Performance for AI Medical Devices under Distribution Shift with Conformal Prediction. arXiv. Preprint posted online July 12, 2022. doi:10.48550/arXiv.2207.05796

73. Collins GS, de Groot JA, Dutton S, et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol. 2014;14 (1):40. doi:10.1186/1471-2288-14-40