Human–AI Clinical Decision Support for Heart Disease Risk Prediction Using Interpretable and Reliable Machine Learning

Wisam Bukaita

doi:10.18103/mra.v14i5.7474

Medical Research Archives - Volume 14, Issue 5, May 2026

PDF

Published May 25, 2026

DOI: https://doi.org/10.18103/mra.v14i5.7474

Downloads

Download data is not yet available.

Submit your own article

Register as an author to reserve your spot in the next issue of the Medical Research Archives.

Author Registration

Join the Society

The European Society of Medicine is more than a professional association. We are a community. Our members work in countries across the globe, yet are united by a common goal: to promote health and health equity, around the world.

Join Europe’s leading medical society and discover the many advantages of membership, including free article publication.

Membership

Wisam Bukaita

Abstract

This study presents a reliability-centered and decision-aware Human–AI clinical decision-support framework for cardiovascular risk prediction using structured clinical data. Unlike conventional machine learning approaches that prioritize discrimination metrics alone, the proposed framework formulates clinical prediction as a multi-dimensional reliability optimization problem, jointly modeling discrimination, probabilistic calibration, subgroup consistency, and robustness under dataset shift. Using a benchmark dataset of 918 patients with independent external validation on the UCI Cleveland cohort (n = 303), multiple machine learning models including Logistic Regression, Random Forest, XGBoost, and CatBoost were evaluated under a unified, leakage-safe protocol. While all models achieved strong internal discrimination (AUC ≥ 0.92), statistical testing revealed no significant differences (p > 0.05), highlighting the limitations of accuracy-centric model selection. External validation demonstrated substantial variability in generalization, with Random Forest achieving the strongest performance (AUC = 0.988), indicating superior robustness under distributional shift. To address limitations of single-metric evaluation, a composite reliability score is introduced to aggregate discrimination, calibration, fairness, and robustness into a unified evaluation framework. Calibration analysis shows that raw model probabilities outperform post-hoc calibration methods (Brier = 0.111, ECE = 0.048), emphasizing the dataset-dependent nature of probabilistic reliability. Subgroup analysis further reveals heterogeneity in calibration performance across patient populations, underscoring the importance of fairness-aware evaluation. Beyond predictive performance, the framework integrates decision-aware modeling through threshold-based risk stratification and Decision Curve Analysis (DCA), enabling optimization with respect to clinical net benefit rather than accuracy alone. The proposed system is further operationalized through a deployment-oriented interface, demonstrating how reliability-aware machine learning can be translated into an interactive clinical decision-support tool with interpretable outputs and actionable recommendations. Collectively, this work advances clinical machine learning from an accuracy-centric paradigm toward a reliability- and utility-driven framework, providing a principled foundation for developing robust, interpretable, and clinically deployable AI systems.

Keywords: Heart disease prediction, Clinical decision support, Interpretable machine learning, SHAP, Calibration, ROC-AUC, External validation, Fairness, CatBoost, XGBoost

How to Cite

BUKAITA, Wisam. Human–AI Clinical Decision Support for Heart Disease Risk Prediction Using Interpretable and Reliable Machine Learning. Medical Research Archives, [S.l.], v. 14, n. 5, may 2026. ISSN 2375-1924. Available at: <https://esmed.org/MRA/mra/article/view/7474>. Date accessed: 16 july 2026. doi: https://doi.org/10.18103/mra.v14i5.7474.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 14 No 5 (2026): Vol.14 Issue 5 May 2026

Section

Research Articles

The Medical Research Archives grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the Medical Research Archives.

References

1. Chen, Tianqi, and Carlos Guestrin. 2016. “XGBoost: A Scalable Tree Boosting System.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2939672.2939785.
2. Collins, Gary S., Johannes B. Reitsma, Douglas G. Altman, and Karel G. M. Moons. 2015. “Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement.” Annals of Internal Medicine. https://doi.org/10.7326/M14-0697.
3. D’Agostino, Ralph B., Ramachandran S. Vasan, Michael J. Pencina, Philip A. Wolf, and William B. Kannel. 2008. “General Cardiovascular Risk Profile for Use in Primary Care: The Framingham Heart Study. Circulation. https://doi.org/10.1161/circulationaha.107.699579.
4. Goff, David C., Donald M. Lloyd-Jones, Glen Bennett, Sean Coady, et al. 2014. “2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk.” Circulation. https://doi.org/10.1161/01.CIR.0000437741.48606.98.
5. Guo, Chuan, Geoffrey Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. “On Calibration of Modern Neural Networks.” International Conference on Machine Learning (ICML).
6. Lundberg, Scott M., and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” Advances in Neural Information Processing Systems (NeurIPS).
7. Niculescu-Mizil, Alexandru, and Rich Caruana. 2005. “Predicting Good Probabilities with Supervised Learning.” Proceedings of the 22nd International Conference on Machine Learning (ICML).
8. Platt, John. 1999. “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods.” In Advances in Large Margin Classifiers.
9. Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You? Explaining the Predictions of Any Classifier.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2939672.2939778.
10. Rudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1: 206–215. https://doi.org/10.1038/s42256-019-0048-x.
11. Wolff, Robert F., et al. 2019. “PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies.” Annals of Internal Medicine. https://doi.org/10.7326/M18-1376.
12. Zadrozny, Bianca, and Charles Elkan. 2002. “Transforming Classifier Scores into Accurate Multiclass Probability Estimates.” Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
13. Bukaita, W., J. R. Jinne, and S. R. Kandula. 2025. “Cardiovascular Disease Prediction Using Machine Learning.” American Journal of Biomedical Science & Research 27 (2). https://doi.org/10.34297/AJBSR.2025.27.003539.
14. Ambale-Venkatesh, Bharath, et al. 2017. “Cardiovascular Event Prediction by Machine Learning: The Multi-Ethnic Study of Atherosclerosis.” Circulation Research 121 (9): 1092–1101. https://doi.org/10.1161/CIRCRESAHA.117.311312
15. Beam, Andrew L., and Isaac S. Kohane. 2018. “Big Data and Machine Learning in Health Care.” JAMA 319 (13): 1317–1318. https://doi.org/10.1001/jama.2017.18391.
16. Doshi-Velez, Finale, and Been Kim. 2017. “Towards a Rigorous Science of Interpretable Machine Learning.” arXiv preprint. https://doi.org/10.48550/arXiv.1702.08608.
17. Goldstein, Benjamin A., Adam M. Navar, Michael J. Pencina, and John P. A. Ioannidis. 2017. “Opportunities and Challenges in Developing Risk Prediction Models with Electronic Health Records Data: A Systematic Review.” Journal of the American Medical Informatics Association 24 (1): 198–208. https://doi.org/10.1093/jamia/ocw042.
18. Kelly, Christopher J., et al. 2019. “Key Challenges for Delivering Clinical Impact with Artificial Intelligence.” BMC Medicine 17: 195. https://doi.org/10.1186/s12916-019-1426-2.
19. Khera, Rohan, et al. 2021. “Use of Machine Learning Models to Predict Death After Acute Myocardial Infarction.” JAMA Cardiology 6 (6): 633–641. https://doi.org/10.1001/jamacardio.2021.0122
20. Obermeyer, Ziad, and Ezekiel J. Emanuel. 2016. “Predicting the Future—Big Data, Machine Learning, and Clinical Medicine.” New England Journal of Medicine 375 (13): 1216–1219. https://doi.org/10.1056/NEJMp1606181.
21. Rajkomar, Alvin, Jeffrey Dean, and Isaac Kohane. 2019. “Machine Learning in Medicine.” New England Journal of Medicine 380 (14): 1347–1358. https://doi.org/10.1056/NEJMra1814259.
22. Samek, Wojciech, Thomas Wiegand, and Klaus-Robert Müller. 2017. “Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models.” IEEE Signal Processing Magazine 34 (6): 76–86. https://doi.org/10.1109/MSP.2017.2743538.
23. Steyerberg, Ewout W. 2019. Clinical Prediction Models. 2nd ed. Cham: Springer. https://doi.org/10.1007/978-3-030-16399-0.
24. Weng, Stephen F., Jenna Reps, Joe Kai, Jonathan M. Garibaldi, and Nisha Qureshi. 2017. “Can Machine-Learning Improve Cardiovascular Risk Prediction Using Routine Clinical Data?” PLoS ONE 12 (4): e0174944. https://doi.org/10.1371/journal.pone.0174944.

European Society of Medicine

Article Sidebar

Downloads

Submit your own article

Join the Society

Main Article Content

Abstract

Article Details

References