Human–AI Clinical Decision Support for Heart Disease Risk Prediction Using Interpretable and Reliable Machine Learning
Main Article Content
Abstract
This study presents a reliability-centered and decision-aware Human–AI clinical decision-support framework for cardiovascular risk prediction using structured clinical data. Unlike conventional machine learning approaches that prioritize discrimination metrics alone, the proposed framework formulates clinical prediction as a multi-dimensional reliability optimization problem, jointly modeling discrimination, probabilistic calibration, subgroup consistency, and robustness under dataset shift. Using a benchmark dataset of 918 patients with independent external validation on the UCI Cleveland cohort (n = 303), multiple machine learning models including Logistic Regression, Random Forest, XGBoost, and CatBoost were evaluated under a unified, leakage-safe protocol. While all models achieved strong internal discrimination (AUC ≥ 0.92), statistical testing revealed no significant differences (p > 0.05), highlighting the limitations of accuracy-centric model selection. External validation demonstrated substantial variability in generalization, with Random Forest achieving the strongest performance (AUC = 0.988), indicating superior robustness under distributional shift. To address limitations of single-metric evaluation, a composite reliability score is introduced to aggregate discrimination, calibration, fairness, and robustness into a unified evaluation framework. Calibration analysis shows that raw model probabilities outperform post-hoc calibration methods (Brier = 0.111, ECE = 0.048), emphasizing the dataset-dependent nature of probabilistic reliability. Subgroup analysis further reveals heterogeneity in calibration performance across patient populations, underscoring the importance of fairness-aware evaluation. Beyond predictive performance, the framework integrates decision-aware modeling through threshold-based risk stratification and Decision Curve Analysis (DCA), enabling optimization with respect to clinical net benefit rather than accuracy alone. The proposed system is further operationalized through a deployment-oriented interface, demonstrating how reliability-aware machine learning can be translated into an interactive clinical decision-support tool with interpretable outputs and actionable recommendations. Collectively, this work advances clinical machine learning from an accuracy-centric paradigm toward a reliability- and utility-driven framework, providing a principled foundation for developing robust, interpretable, and clinically deployable AI systems.
Article Details
The Medical Research Archives grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the Medical Research Archives.
References
2. Collins, Gary S., Johannes B. Reitsma, Douglas G. Altman, and Karel G. M. Moons. 2015. “Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement.” Annals of Internal Medicine. https://doi.org/10.7326/M14-0697.
3. D’Agostino, Ralph B., Ramachandran S. Vasan, Michael J. Pencina, Philip A. Wolf, and William B. Kannel. 2008. “General Cardiovascular Risk Profile for Use in Primary Care: The Framingham Heart Study. Circulation. https://doi.org/10.1161/circulationaha.107.699579.
4. Goff, David C., Donald M. Lloyd-Jones, Glen Bennett, Sean Coady, et al. 2014. “2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk.” Circulation. https://doi.org/10.1161/01.CIR.0000437741.48606.98.
5. Guo, Chuan, Geoffrey Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. “On Calibration of Modern Neural Networks.” International Conference on Machine Learning (ICML).
6. Lundberg, Scott M., and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” Advances in Neural Information Processing Systems (NeurIPS).
7. Niculescu-Mizil, Alexandru, and Rich Caruana. 2005. “Predicting Good Probabilities with Supervised Learning.” Proceedings of the 22nd International Conference on Machine Learning (ICML).
8. Platt, John. 1999. “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods.” In Advances in Large Margin Classifiers.
9. Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You? Explaining the Predictions of Any Classifier.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2939672.2939778.
10. Rudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1: 206–215. https://doi.org/10.1038/s42256-019-0048-x.
11. Wolff, Robert F., et al. 2019. “PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies.” Annals of Internal Medicine. https://doi.org/10.7326/M18-1376.
12. Zadrozny, Bianca, and Charles Elkan. 2002. “Transforming Classifier Scores into Accurate Multiclass Probability Estimates.” Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
13. Bukaita, W., J. R. Jinne, and S. R. Kandula. 2025. “Cardiovascular Disease Prediction Using Machine Learning.” American Journal of Biomedical Science & Research 27 (2). https://doi.org/10.34297/AJBSR.2025.27.003539.
14. Ambale-Venkatesh, Bharath, et al. 2017. “Cardiovascular Event Prediction by Machine Learning: The Multi-Ethnic Study of Atherosclerosis.” Circulation Research 121 (9): 1092–1101. https://doi.org/10.1161/CIRCRESAHA.117.311312
15. Beam, Andrew L., and Isaac S. Kohane. 2018. “Big Data and Machine Learning in Health Care.” JAMA 319 (13): 1317–1318. https://doi.org/10.1001/jama.2017.18391.
16. Doshi-Velez, Finale, and Been Kim. 2017. “Towards a Rigorous Science of Interpretable Machine Learning.” arXiv preprint. https://doi.org/10.48550/arXiv.1702.08608.
17. Goldstein, Benjamin A., Adam M. Navar, Michael J. Pencina, and John P. A. Ioannidis. 2017. “Opportunities and Challenges in Developing Risk Prediction Models with Electronic Health Records Data: A Systematic Review.” Journal of the American Medical Informatics Association 24 (1): 198–208. https://doi.org/10.1093/jamia/ocw042.
18. Kelly, Christopher J., et al. 2019. “Key Challenges for Delivering Clinical Impact with Artificial Intelligence.” BMC Medicine 17: 195. https://doi.org/10.1186/s12916-019-1426-2.
19. Khera, Rohan, et al. 2021. “Use of Machine Learning Models to Predict Death After Acute Myocardial Infarction.” JAMA Cardiology 6 (6): 633–641. https://doi.org/10.1001/jamacardio.2021.0122
20. Obermeyer, Ziad, and Ezekiel J. Emanuel. 2016. “Predicting the Future—Big Data, Machine Learning, and Clinical Medicine.” New England Journal of Medicine 375 (13): 1216–1219. https://doi.org/10.1056/NEJMp1606181.
21. Rajkomar, Alvin, Jeffrey Dean, and Isaac Kohane. 2019. “Machine Learning in Medicine.” New England Journal of Medicine 380 (14): 1347–1358. https://doi.org/10.1056/NEJMra1814259.
22. Samek, Wojciech, Thomas Wiegand, and Klaus-Robert Müller. 2017. “Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models.” IEEE Signal Processing Magazine 34 (6): 76–86. https://doi.org/10.1109/MSP.2017.2743538.
23. Steyerberg, Ewout W. 2019. Clinical Prediction Models. 2nd ed. Cham: Springer. https://doi.org/10.1007/978-3-030-16399-0.
24. Weng, Stephen F., Jenna Reps, Joe Kai, Jonathan M. Garibaldi, and Nisha Qureshi. 2017. “Can Machine-Learning Improve Cardiovascular Risk Prediction Using Routine Clinical Data?” PLoS ONE 12 (4): e0174944. https://doi.org/10.1371/journal.pone.0174944.