Identifying Biomarkers of Cardiovascular Diseases with Machine Learning: Evidence from The UK Household Longitudinal Study

Main Article Content

Vasilis Nikolaou Sebastiano Massaro Masoud Fakhimi Wolfgang Garn


Cardiovascular diseases  are a significant global health concern, responsible for one-third of deaths worldwide and posing a substantial burden on society and national healthcare systems. To effectively address this challenge and develop targeted intervention strategies, the ability to predict cardiovascular diseases from standardized assessments, such as occupational health encounters or national surveys, is critical. This study aims to assist these efforts by identifying a set of biomarkers, which together with known risk factors, can predict cardiovascular diseases on the onset. We used a sample of 7,767 individuals from the UK household longitudinal study ‘Understanding Society’ to train several machine learning models able to pinpoint biomarkers and risk factors at baseline that predict cardiovascular diseases at a ten-year follow-up. A logistic regression model was trained for comparison. A gaussian naïve bayes classifier returned 82% recall in contrast to 48% of the logistic regression, allowing us to identify the most prominent biomarkers predicting cardiovascular diseases. These findings show the opportunity to use machine learning to identify a wide range of previously overlooked biomarkers associated with cardiovascular diseases onset and thus encourage the implementation of such a model in the early diagnosis and prevention of cardiovascular diseases in future research and practice.

Keywords: Machine Learning, Naïve Bayes, Logistic Regression, Biomarkers, Cardiovascular diseases

Article Details

How to Cite
NIKOLAOU, Vasilis et al. Identifying Biomarkers of Cardiovascular Diseases with Machine Learning: Evidence from The UK Household Longitudinal Study. Medical Research Archives, [S.l.], v. 12, n. 1, jan. 2024. ISSN 2375-1924. Available at: <>. Date accessed: 03 mar. 2024. doi:
Research Articles


1. NHS - cardiovascular disease ( Accessed on 12/08/2022
2. WHO - cardiovascular diseases (CVDs) ( Accessed on 12/08/2022.
3. Colombi AM, Wood GC. Obesity in the workplace: Impact on cardiovascular disease, cost, and utilization of care. American Health & Drug Benefits. 2011 Sep;4(5):271.
4. Kahn R, Robertson RM, Smith R, Eddy D. The impact of prevention on reducing the burden of cardiovascular disease. Circulation. 2008 Jul 29;118(5):576-85.
5. Ridker, P. M., Buring, J. E., Rifai, N., & Cook, N. R. (2007). Development and validation of improved algorithms for the assessment of global cardiovascular risk in women: the Reynolds Risk Score. Jama, 297(6), 611-619.
6. Kavanagh A, Bentley RJ, Turrell G, Shaw J, Dunstan D, Subramanian SV. Socioeconomic position, gender, health behaviours and biomarkers of cardiovascular disease and diabetes. Social science & medicine. 2010 Sep 1;71(6):1150-60.
7. Ryan M, Gallagher S, Jetten J, Muldoon OT. State level income inequality affects cardiovascular stress responses: Evidence from the Midlife in the United States (MIDUS) study. Social Science & Medicine. 2022 Oct 1;311:115359.
8. Alessie RJ, Angelini V, van den Berg GJ, Mierau JO, Viluma L. Economic conditions at birth and cardiovascular disease risk in adulthood: Evidence from post-1950 cohorts. Social Science & Medicine. 2019 Mar 1;224:77-84.
9. Browning CR, Cagney KA, Iveniuk J. Neighborhood stressors and cardiovascular health: Crime and C-reactive protein in Dallas, USA. Social science & medicine. 2012 Oct 1;75(7):1271-9.
10. Melander O, Newton-Cheh C, Almgren P et al. Novel and conventional biomarkers for prediction of incident cardiovascular events in the community. Jama. 2009 Jul 1;302(1):49-57.
11. Shlipak M. G., Ix J. H., Bibbins-Domingo K., Lin F., & Whooley M. A. (2008). Biomarkers to predict recurrent cardiovascular disease: the Heart and Soul Study. The American journal of medicine, 121(1), 50-57.
12. Zethelius B, Berglund L, Sundström Jet al. Use of multiple biomarkers to improve the prediction of death from cardiovascular causes. New England Journal of Medicine. 2008 May 15;358(20):2107-16.
13. Folsom AR, Chambless LE, Ballantyne CM et al. An assessment of incremental coronary risk prediction using C-reactive protein and other novel risk markers: the atherosclerosis risk in communities study. Archives of internal medicine. 2006 Jul 10;166(13):1368-73.
14. Wang TJ, Gona P, Larson MG et al. Multiple biomarkers for the prediction of first major cardiovascular events and death. New England Journal of Medicine. 2006 Dec 21;355(25):2631-9.
15. Blankenberg S, McQueen MJ, Smieja M et al. Comparative impact of multiple biomarkers and N-Terminal pro-brain natriuretic peptide in the context of conventional risk factors for the prediction of recurrent cardiovascular events in the Heart Outcomes Prevention Evaluation (HOPE) Study. Circulation. 2006 Jul 18;114(3):201-8.
16. Cao J, Li J, Gu Z et al. Combined metabolomics and machine learning algorithms to explore metabolic biomarkers for diagnosis of acute myocardial ischemia. International Journal of Legal Medicine. 2022 Mar 29:1-2.
17. Alaa AM, Bolton T, Di Angelantonio E, Rudd JH, Van der Schaar M. Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PloS one. 2019 May 15;14(5):e0213653.
18. Nikolaou V, Massaro S, Garn W, Fakhimi M, Stergioulas L, Price D. The cardiovascular phenotype of Chronic Obstructive Pulmonary Disease (COPD): Applying machine learning to the prediction of cardiovascular comorbidities. Respiratory Medicine. 2021 Sep 1;186:106528.
19. The UK Household Longitudinal Study. Available online: (accessed on 31 July 2022).
20. Browning LM, Hsieh SD, Ashwell M. A systematic review of waist‐to-height ratio as a screening tool for the prediction of cardiovascular disease and diabetes: 0.5 could be a suitable global boundary value. Nutr Res Rev. 2010;23(02):247‐269
21. Ribeiro RC, Coutinho M, Bramorski MA, Giuliano IC, Pavan J. Association of the waist‐to‐height ratio with cardiovascular risk factors in children and adolescents: the Three Cities Heart Study. Int J Prev Med. 2010;1(1):39‐49
22. Khoury M, Manlhiot C, McCrindle BW. Role of the waist/height ratio in the cardiometabolic risk assessment of children classified by body mass index. J Am Coll Cardiol. 2013;62(8):742‐751
23. Van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. doi: 10.18637/jss.v045.i03
24. Rokach, Lior; Maimon, O. (2014). Data mining with decision trees: theory and applications, 2nd Edition. World Scientific Pub Co Inc. doi:10.1142/9097
25. Breiman, L. Random Forests. Machine Learning 45, 5-32 (2001).
26. Chen T, He T, Benesty M et al. Xgboost: extreme gradient boosting. R package version 0.4-2. 2015 Aug 1;1(4):1-4
27. Bhuvaneswari R, Kalaiselvi K. Naive Bayesian classification approach in healthcare applications. International Journal of Computer Science and Telecommunications. 2012 Jan;3(1):106-12.
28. Zhu T, Lin Y, Liu Y. Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recognition. 2017 Dec 1;72:327-40.
29. Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37-63
30. Štrumbelj E, Kononenko I. Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems. 2014 Dec;41(3):647-65.
31. Benzeval M., Davillas A., Kumari M., Lynn P. (2014). “Understanding Society: The UK Household Longitudinal Study. Biomarker User Guide and Glossary”.
32. Nikolaou V, Massaro S, Fakhimi M, Garn W. Using Machine Learning to Detect Theranostic Biomarkers Predicting Respiratory Treatment Response. Life. 2022 May 24;12(6):775.
33. Chang, W.H., Mueller, S.H., Chung, S.-C., Foster, G.R., Lai, A.G. Increased burden of cardiovascular disease in people with liver disease: Unequal geographical variations, risk factors and excess years of life lost. J. Transl. Med. 2022, 20, 2
34. Petrie, J., Guzik, T.J., Touyz, R.M. Diabetes, Hypertension, and Cardiovascular Disease: Clinical Insights and Vascular Mechanisms. Can. J. Cardiol. 2018, 34, 575–584
35. Kong, K.A., Jung, S., Yu, M., Park, J., Kang, I.S. Association Between Cardiovascular Risk Factors and the Severity of Coronavirus Disease 2019: Nationwide Epidemiological Study in Korea. Front. Cardiovasc. Med. 2021, 8
36. Ssentongo, P., Ssentongo, A.E., Heilbrunn, E.S., Ba, D.M., Chinchilli, V.M. Association of cardiovascular disease and 10 other pre-existing comorbidities with COVID-19 mortality: A systematic review and meta-analysis. PLoS ONE 2020, 15, e0238215
37. Viglino, D., Jullian-Desayes, I., Minoves, M. et al. Nonalcoholic fatty liver disease in chronic obstructive pulmonary disease. Eur. Respir. J. 2017, 1, 49