Strategies and Considerations for Safe Reinforcement Learning in Programming Cardiac Implantable Electronic Devices

John Komp; Aaptha Boggaram; David P. Kao; Ashutosh Trivedi; Michael A. Rosenberg

doi:10.18103/mra.v13i3.6363

PDF

Published Mar 29, 2025

DOI: https://doi.org/10.18103/mra.v13i3.6363

Downloads

Download data is not yet available.

Submit your own article

Register as an author to reserve your spot in the next issue of the Medical Research Archives.

Author Registration

Join the Society

The European Society of Medicine is more than a professional association. We are a community. Our members work in countries across the globe, yet are united by a common goal: to promote health and health equity, around the world.

Join Europe’s leading medical society and discover the many advantages of membership, including free article publication.

Membership

John Komp

College of Engineering and Applied Science, University of Colorado, Boulder, CO, USA

Aaptha Boggaram

College of Engineering and Applied Science, University of Colorado, Boulder, CO, USA

David P. Kao

Division of Cardiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA

Ashutosh Trivedi

College of Engineering and Applied Science, University of Colorado, Boulder, CO, USA

Michael A. Rosenberg

Division of Cardiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA

Abstract

The programming of cardiac implantable electronic devices, such as pacemakers and implantable defibrillators, represents a promising domain for the application of automated learning systems. These systems, leveraging a type of artificial intelligence called reinforcement learning, have the potential to personalize medical treatment by adapting device settings based on an individual’s physiological responses. At the core of these self-learning algorithms is the principle of balancing exploration and exploitation. Exploitation refers to the selection of device programming settings previously demonstrated to provide clinical benefit, while exploration refers to the real-time search for adjustments to device programming that could provide an improvement in clinical outcomes for each individual. Exploration is a critical component of the reinforcement learning algorithm, and provides the opportunity to identify settings that could directly benefit individual patients. However, unconstrained exploration poses risks, as an automated change in certain settings may lead to adverse clinical outcomes. To mitigate these risks, several strategies have been proposed to ensure that algorithm-driven programming changes achieve the desired level of individualized optimization without compromising patient safety. In this review, we examine the existing literature on safe reinforcement learning algorithms in automated systems and discuss their potential application to the programming of cardiac implantable electronic devices.

How to Cite

KOMP, John et al. Strategies and Considerations for Safe Reinforcement Learning in Programming Cardiac Implantable Electronic Devices. Medical Research Archives, [S.l.], v. 13, n. 3, mar. 2025. ISSN 2375-1924. Available at: <https://esmed.org/MRA/mra/article/view/6363>. Date accessed: 24 feb. 2026. doi: https://doi.org/10.18103/mra.v13i3.6363.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 13 No 3 (2025): Vol.13, Issue 3, March 2025

Section

Research Articles

The Medical Research Archives grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the Medical Research Archives.

References

1. Sutton RS, AG B. Reinforcement Learning. 2nd ed. Cambridge, MA: MIT Press; 2018.
2. Hu Y, Si B. A Reinforcement Learning Neural Network for Robotic Manipulator Control. Neural computation. 2018;30:1983-2004. doi: 10.1162/neco_a_01079
3. Peters J, Schaal S. Reinforcement learning of motor skills with policy gradients. Neural networks : the official journal of the International Neural Network Society. 2008;21:682-697. doi: 10.1016/j.neunet.2008.02.003
4. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al. Mastering the game of Go with deep neural networks and tree search. Nature. 2016;529:484-489. doi: 10.1038/nature16961
5. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, et al. Mastering the game of Go without human knowledge. Nature. 2017;550:354-359. doi: 10.1038/nature24270
6. Google RankBrain. 2020.
7. Levy AE, Biswas M, Weber R, Tarakji K, Chung M, Noseworthy PA, Newton-Cheh C, Rosenberg MA. Applications of machine learning in decision analysis for dose management for dofetilide. PLoS One. 2019;14:e0227324. doi: 10.1371/journal.pone.0227324
8. Prasad N, Cheng LF, Chivers C, Draugelis M, B. E. A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. arXiv 2017;https://arxiv.org/abs/1704.06300.
9. Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med. 2018;24:1716-1720. doi: 10.1038/s41591-018-0213-5
10. Barrett CD, Suzuki Y, Hussein S, Garg L, Tumolo A, Sandhu A, West JJ, Zipse M, Aleong R, Varosy P, et al. Evaluation of Quantitative Decision-Making for Rhythm Management of Atrial Fibrillation Using Tabular Q-Learning. Journal of the American Heart Association. 2023;12:e028483. doi:10.1161/jaha.122.028483
11. Aquilina O. A brief history of cardiac pacing. Images Paediatr Cardiol. 2006;8:17-81.
12. Kusumoto FM, Schoenfeld MH, Barrett C, Edgerton JR, Ellenbogen KA, Gold MR, Goldschlager NF, Hamilton RM, Joglar JA, Kim RJ, et al. 2018 ACC/AHA/HRS Guideline on the Evaluation and Management of Patients With Bradycardia and Cardiac Conduction Delay: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines and the Heart Rhythm Society. Circulation. 2019;140:e382-e482. doi: doi:10.1161/CIR.0000000000000628
13. Brubaker PH, Kitzman DW. Chronotropic incompetence: causes, consequences, and management. Circulation. 2011;123:1010-1020. doi: 10.1161/circulationaha.110.940577
14. Hinkle LE, Jr., Carver ST, Plakun A. Slow heart rates and increased risk of cardiac death in middle-aged men. Arch Intern Med. 1972;129:732-748.
15. Świerżyńska E, Oręziak A, Główczyńska R, Rossillo A, Grabowski M, Szumowski Ł, Caprioglio F, Sterliński M. Rate-Responsive Cardiac Pacing: Technological Solutions and Their Applications. Sensors (Basel, Switzerland). 2023;23. doi: 10.3390/s23031427
16. Shang X, Lu R, Liu M, Xiao S, Dong N. Heart rate and outcomes in patients with heart failure with preserved ejection fraction: A dose-response meta-analysis. Medicine (Baltimore). 2017;96:e8431. doi: 10.1097/md.0000000000008431
17. Fonarow GC, Stough WG, Abraham WT, Albert NM, Gheorghiade M, Greenberg BH, O'Connor CM, Sun JL, Yancy CW, Young JB. Characteristics, treatments, and outcomes of patients with preserved systolic function hospitalized for heart failure: a report from the OPTIMIZE-HF Registry. J Am Coll Cardiol. 2007;50:768-777. doi:10.1016/j.jacc.2007.04.064
18. Infeld M, Wahlberg K, Cicero J, Plante TB, Meagher S, Novelli A, Habel N, Krishnan AM, Silverman DN, LeWinter MM, et al. Effect of Personalized Accelerated Pacing on Quality of Life, Physical Activity, and Atrial Fibrillation in Patients With Preclinical and Overt Heart Failure With Preserved Ejection Fraction: The myPACE Randomized Clinical Trial. JAMA cardiology. 2023;8:213-221. doi: 10.1001/jamacardio.2022.5320
19. Reddy YNV, Koepp KE, Carter R, Win S, Jain CC, Olson TP, Johnson BD, Rea R, Redfield MM, Borlaug BA. Rate-Adaptive Atrial Pacing for Heart Failure With Preserved Ejection Fraction: The RAPID-HF Randomized Clinical Trial. Jama. 2023;329:801-809. doi: 10.1001/jama.2023.0675
20. Moss AJ, Hall WJ, Cannom DS, Klein H, Brown MW, Daubert JP, Estes NA, 3rd, Foster E, Greenberg H, Higgins SL, et al. Cardiac-resynchronization therapy for the prevention of heart-failure events. N Engl J Med. 2009;361:1329-1338. doi: 10.1056/NEJMoa0906431
21. Aiba T, Hesketh GG, Barth AS, Liu T, Daya S, Chakir K, Dimaano VL, Abraham TP, O'Rourke B, Akar FG, et al. Electrophysiological consequences of dyssynchronous heart failure and its restoration by resynchronization therapy. Circulation. 2009;119: 1220-1230. doi:10.1161/CIRCULATIONAHA.108.794834
22. Goldberger Z, Lampert R. Implantable cardioverter-defibrillators: expanding indications and technologies. Jama. 2006;295:809-818. doi: 10.1001/jama.295.7.809
23. Zipes DP, Camm AJ, Borggrefe M, Buxton AE, Chaitman B, Fromer M, Gregoratos G, Klein G, Moss AJ, Myerburg RJ. ACC/AHA/ESC 2006 guidelines for management of patients with ventricular arrhythmias and the prevention of sudden cardiac death: a report of the American College of Cardiology/American Heart Association Task Force and the European Society of Cardiology Committee for Practice Guidelines (Writing Committee to Develop guidelines for management of patients with ventricular arrhythmias and the prevention of sudden cardiac death) developed in collaboration with the European Heart Rhythm Association and the Heart Rhythm Society. Europace. 2006;8:746-837.
24. Goodfellow I, Bengio Y, Courville A. Deep Learning. MIT Press; 2016.
25. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al. Human-level control through deep reinforcement learning. Nature. 2015;518:529-533. doi: 10.1038/nature14236
26. Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature. 2019;575:350-354. doi: 10.1038/s41586-019-1724-z
27. Kaufmann E, Bauersfeld L, Loquercio A, Müller M, Koltun V, Scaramuzza D. Champion-level drone racing using deep reinforcement learning. Nature. 2023;620:982-987. doi: 10.1038/s41586-023-06419-4
28. Dole K, Gupta A, Komp J, Krishna S, Trivedi A. Correct-by-Construction Reinforcement Learning of Cardiac Pacemakers from Duration Calculus Requirements. Proceedings of the AAAI Conference on Artificial Intelligence. 2023;37:14792-14800. doi: 10.1609/aaai.v37i12.26728
29. Whalen MW, Gacek A, Cofer D, Murugesan A, Heimdahl MPE, Rayadurgam S. Your "What" Is My "How": Iteration and Hierarchy in System Design. IEEE Software. 2013;30:54-60. doi: 10.1109/MS.2012.173
30. Laboratory SQR. PACEMAKER System Specification. Boston Scientific; 2007.
31. Gomes AO, Oliveira MVM. Formal Specification of a Cardiac Pacing System. Paper/Poster presented at: FM 2009: Formal Methods; 2009//, 2009; Berlin, Heidelberg.
32. Larson BR. Formal semantics for the PACEMAKER system specification. In: Proceedings of the 2014 ACM SIGAda annual conference on High integrity language technology. Portland, Oregon, USA: Association for Computing Machinery; 2014:47–60.
33. Méry D, Singh NK. Formal Specification of Medical Systems by Proof-Based Refinement. ACM Trans Embed Comput Syst. 2013;12:Article 15. doi: 10.1145/2406336.2406351
34. Jiang Z, Pajic M, Moarref S, Alur R, Mangharam R. Modeling and verification of a dual chamber implantable pacemaker. In: Proceedings of the 18th international conference on Tools and Algorithms for the Construction and Analysis of Systems. Tallinn, Estonia: Springer-Verlag; 2012:188–203.
35. Dole K, Gupta A, Komp J, Krishna S, Trivedi A. Event-Triggered and Time-Triggered Duration Calculus for Model-Free Reinforcement Learning. Paper/Poster presented at: 2021 IEEE Real-Time Systems Symposium (RTSS); 7-10 Dec 2021, 2021;
36. Bloem R, Könighofer B, Könighofer R, Wang C. Shield Synthesis. Paper/Poster presented at: Tools and Algorithms for the Construction and Analysis of Systems; 2015//, 2015; Berlin, Heidelberg.
37. Alshiekh M, Bloem R, Ehlers R, Könighofer B, Niekum S, Topcu U. Safe Reinforcement Learning via Shielding. Proceedings of the AAAI Conference on Artificial Intelligence. 2018;32. doi:10.1609/aaai.v32i1.11797

European Society of Medicine

Article Sidebar

Downloads

Submit your own article

Join the Society

Main Article Content

Abstract

Article Details

References