Mathematical Analysis of Statistical Design of Experiment and Machine Learning Methods in Identifying Factors Influencing Obesity
Main Article Content
Abstract
Introduction: This paper explores a mathematical framework for defining factors influencing obesity by comparing statistical design of experiment and machine learning (ML) approaches.
Methods: A low-calorie program was applied to 100 overweight to morbidly obese patients monitored over 8 visits in 4 months and over. A traditional three-factor experimental design was employed to evaluate the impact of glucose, Alanine aminotransferase (ALT) enzyme, and cholesterol levels on obesity. ML methods (Multiple Linear Regression, Random Forest, Decision Tree Classifier, Gradient Boosting Regressor and XGBoost) were employed to evaluate the impact of glucose, ALT enzyme, cholesterol levels, body mass, blood pressure, and sex on obesity.
Results: The three-factor experiment indicated glucose had the greatest impact on obesity, followed by cholesterol and ALT, particularly significant in females. ML models, with over 90% accuracy and RMSE less than 1.5, corroborated these findings and also highlighted the roles of blood pressure.
Conclusion: Both statistical and ML models aim to understand relationships between variables and predict outcomes, differing in assumptions, flexibility, and interpretability. Statistical methods offer high interpretability and rigorous testing, while ML provides flexibility and robust performance with complex data.
Article Details
The Medical Research Archives grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the Medical Research Archives.
References
2. NIST SEMATECH. e-Handbook of Statistical Methods. https://doi.org/10.18434/M32189.
3. Levine, M. D., Stephan, F. D. (2022). Even You Can Learn Statistics and Analytics: An Easy to Understand Guide to Statistics and Analytics (4th ed.). Pearson FT Press, 211-248.
4. Salsburg, D. (2002). The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. Henry Holt and Company. ISBN 0-8050-7134-2.
5. Jiju, A. (2014). Design of Experiments for Engineers and Scientists (2nd ed.). Elsevier, Amsterdam, Netherlands.
6. Das, K. A., Dewanjee, S. (2018). Optimisation of Extraction Using Mathematical Models and Computation. In: Sarker, D. S., Nahar, L. (Eds.), Computational Phytochemistry. Elsevier, Amsterdam, Netherlands, 75-106.
7. Ait-Amir, B., El Hami, A., Pougnet, P. (2020). Meta-Model Development. In: El Hami, A., Pougnet, P. (Eds.), Embedded Mechatronic Systems 2 (2nd ed.). Elsevier, Amsterdam, Netherlands. https://doi.org/10.1016/B978-1-78548-014-0.50006-2. Accessed 19 December 2022.
8. Antoska Knights, V., & Millaku, J. (2023). Three-factor experimental design as a tool in applied statistics. International Journal of Statistics and Applied Mathematics, 8(1), 46-49. https://doi.org/10.22271/maths.2023.v8.i1a.929
9. Markovikj, G., & Knights, V. (2022). Model of optimization of the sustainable diet indicators. Journal of Hygienic Engineering and Design, 39, 169-175.
10. Sun, Y., Wang, X., Zhang, C., & Zuo, M. (2023). Multiple Regression: Methodology and Applications. Highlights in Science, Engineering and Technology AMMSAC, 49, 542.
11. Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
12. Iwasaki, M. (2020). Multiple Regression Analysis from Data Science Perspective. In: Multiple Regression Analysis, 131-140. https://doi.org/10.1007/978-981-15-2700-5_8.
13. Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2004). Applied Linear Statistical Models. McGraw-Hill Education.
14. Knights, V., & Prchkovska, M. (2024). From equations to predictions: Understanding the mathematics and machine learning of multiple linear regression. Journal of Mathematical & Computer Applications, 3(2), 1-8. https://doi.org/10.47363/JMCA/2024(3)137
15. Cui, T., Chen, Y., Wang, J., Deng, H., & Huang, Y. (2021). Estimation of Obesity Levels Based on Decision Trees. 2021 International Symposium on Artificial Intelligence and its Application on Media (ISAIAM), 160-165. https://doi.org/10.1109/ISAIAM53259.2021.00041
16. Iparraguirre-Villanueva, O., Mirano-Portilla, L., Gamarra-Mendoza, M., & Robles-Espiritu, W. (2024). Predicting obesity in nutritional patients using decision tree modeling. International Journal of Advanced Computer Science and Applications. Retrieved from https://api.semanticscholar.org/CorpusID:268819010
17. Cui, T., Chen, Y., Wang, J., Deng, H., & Huang, Y. (2021). Estimation of obesity levels based on decision trees. In 2021 International Symposium on Artificial Intelligence and its Application on Media (ISAIAM) (pp. 160-165). Retrieved from https://api.semanticscholar.org/CorpusID:237296330
18. Rodríguez-Pardo, C., Segura, A., Zamorano-León, J. J., Martínez-Santos, C., Martínez, D., Collado-Yurrita, L., ... & López-Farre, A. (2019). Decision tree learning to predict overweight/ obesity based on body mass index and gene polymorphisms. Gene, 699, 88-93. https://doi.org/10.1016/j.gene.2019.03.011
19. Han, S., Williamson, B. D., & Fong, Y. (2021). Improving random forest predictions in small datasets from two-phase sampling designs. BMC Medical Informatics and Decision Making, 21, 322. https://doi.org/10.1186/s12911-021-01688-3
20. Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real-world classification problems? Journal of Machine Learning Research, 15, 3133-3181.
21. Lu, X., & Bengio, Y. (2005). An analysis of the random subspace method for decision forest. In Proceedings of the 22nd International Conference on Machine Learning (ICML) (Vol. 1, pp. 497-504). New York, NY, USA.
22. Liaw, A., & Wiener, M. (2002). Classification and regression by random forest. R News, 2(3), 18-22.
23. Jana, M. (2023). Exploring Machine Learning Models: A Comprehensive Comparison of Logistic Regression, Decision Trees, SVM, Random Forest, and XGBoost. Medium. Available from: https://medium.com/@malli.learnings/exploring-machine-learning-models-a-comprehensive-comparison-of-logistic-regression-decision-38cc12287055
24. Lee, H., Wang, J., & Leblon, B. (2020). Using Linear Regression, Random Forests, and Support Vector Machine with Unmanned Aerial Vehicle Multispectral Images to Predict Canopy Nitrogen Weight in Corn. Remote Sensing, 12(13), 2071. https://doi.org/10.3390/rs12132071
25. Miotto, R., Wang, F., Wang, S., Jiang, X., & Dudley, J. T. (2018). Deep learning for healthcare: review, opportunities and challenges. Briefings in Bioinformatics, 19(6), 1236-1246. https://doi.org/10.1093/bib/bbx044
26. Maharana, A., & Nsoesie, E. O. (2018). Use of deep learning to examine the association of the built environment with prevalence of neighborhood adult obesity. JAMA Network Open, 1(4), e181535. https://doi.org/10.1001/jamanetworkopen.2018.1535
27. U, S., K. PT, & K, S. (2021). Computer aided diagnosis of obesity based on thermal imaging using various convolutional neural networks. Biomedical Signal Processing and Control, 63, 102233. https://doi.org/10.1016/j.bspc.2020.102233
28. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer.
29. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
30. Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.
31. Knights, V., Kolak, M., Markovikj, G., & Gajdoš Kljusurić, J. (2023). Modeling and optimization with artificial intelligence in nutrition. Applied Sciences, 13(13), 7835.
32. Knights, V., Gavriloska, E. D., et al. (2024). Machine Learning Techniques for Modelling and Predicting the Influence of Kefir in a Low-Protein Diet on Kidney Function. Medical Research Archives, 12(7). https://doi.org/10.18103/mra.v12i7.0000
33. An, R., Shen, J., & Xiao, Y. (2022). Applications of Artificial Intelligence to Obesity Research: Scoping Review of Methodologies. Journal of Medical Internet Research, 24, e40589. https://doi.org/10.2196/40589
34. Markovikj, G., Knights, V., & Kljusurić, J. G. (2023). Ketogenic Diet Applied in Weight Reduction of Overweight and Obese Individuals with Progress Prediction by Use of the Modified Wishnofsky Equation. Nutrients, 15, 927. https://doi.org/10.3390/nu15040927
35. Markovikj, G., Knights, V., & Gajdoš Kljusurić, J. (2023). Body Weight Loss Efficiency in Overweight and Obese Adults in the Ketogenic Reduction Diet Program—Case Study. Applied Sciences, 13, 10704. https://doi.org/10.3390/app131910704
36. Markovikj, G., Knights, V., Nikolovska Nedelkovska, D., & Damjanovski, D. (2020). Statistical analysis of results in patients applying the sustainable diet indicators. Journal of Hygienic Engineering and Design, 30, 35–39. https://doi.org/10.3390/app131910704
37. Westman, E. (2013). A Low Carbohydrate, Ketogenic Diet Manual: No Sugar, No Starch Diet. CreateSpace Independent Publishing Platform, Scotts Valley, USA.
38. Moore, J., & Westman, C. M. D. (2014). Keto Clarity. Retrieved from https://www.scribd.com/document/412124479/Keto-Clarity-by-Jimmy-Moore-and-Eric-Westman-MD. Accessed 8 December 2022.
39. Greene, W. H. (2018). Econometric Analysis. Pearson Education. Available at: https://www.ctanujit.org/uploads/2/5/3/9/25393293/_econometric_analysis_by_greence.pdf
40. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.
41. Montgomery, D. C. (2013). Design and Analysis of Experiments (8th ed.). John Wiley & Sons.
42. Anderson, M., & Whitcomb, P. (2007). DOE Simplified: Practical Tools for Effective Experimentation (2nd ed.). Retrieved from https://cdnm.statease.com/pubs/doesimp2excerpt--chap3.pdf
43. IMCF Designs. (2013). Experimental Design: Multiple Independent Variables. Retrieved from https://uca.edu/psychology/files/2013/08/Ch13-Experimental-Design_Multiple-Independent-Variables.pdf
44. Müller, A. C., & Guido, S. (2016). Introduction to Machine Learning with Python: A Guide for Data Scientists. O’Reilly Media. Available at https://www.nrigroupindia.com/e-book/Introduction%20to%20Machine%20Learning%20with%20Python%20(%20PDFDrive.com%20)-min.pdf
45. Sheskin, D. J. (2000). Handbook of Parametric and Nonparametric Statistical Procedures (2nd ed.). Chapman & Hall/CRC.
46. Kiemele, M. J., Schmidt, S. R., & Berdine, R. J. (1997). Basic Statistics: Tools for Continuous Improvement (4th ed.). Air Academy Press.
47. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning: With Applications in R. Springer.
48. Dean, A., Morris, M., Stufken, J., & Bingham, D. (Eds.). (2015). Handbook of Design and Analysis of Experiments. CRC Press.
49. Dean, A. M., & Voss, D. T. (1999). Design and Analysis of Experiments. Springer.
50. Lane, D. M., Scott, D., Hebl, M., Guerra, R., Osherson, D., & Zimmer, H. (2024). Introduction to Statistics (Online ed.). Rice University; University of Houston, Downtown Campus. Available at: Online _Statistics_Education.pdf (onlinestatbook.com)
51. Leonardo, A. (2024). The Practically Cheating Statistics Handbook (5th ed.). Practically Cheating. Available at: Tables - Statistics How To
52. Thakur, A. (2020). Approaching (Almost) Any Machine Learning Problem. Independently published.
53. Burkov, A. (2019). The Hundred-Page Machine Learning Book. Andriy Burkov.
54. Boehmke, B., & Greenwell, B. (2019). Hands-On Machine Learning with R. CRC Press.
55. Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for Machine Learning. Cambridge University Press. https://mml-book.com
56. Smola, A., & Vishwanathan, S. V. N. (2008). Introduction to Machine Learning. Cambridge University Press. Available at https://alex.smola.org/drafts/thebook.pdf
57. Mitchell, T. M. (1997). Machine Learning. McGraw-Hill Science/Engineering/Math. Available at https://www.cin.ufpe.br/~cavmj/Machine%20-%20Learning%20-%20Tom%20Mitchell.pdf
58. Yao, X., Hu, K., & Wang, Z. et al. (2024). Liver indicators affecting the relationship between BMI and hypertension in type 2 diabetes: a mediation analysis. Diabetology & Metabolic Syndrome, 16, 19. https://doi.org/10.1186/s13098-023-01254-z
59. Thamrin, Sri Astuti, Dian Sidik Arsyad, Hedi Kuswanto, Armin Lawi, and Sudirman Nasir. "Predicting Obesity in Adults Using Machine Learning Techniques: An Analysis of Indonesian Basic Health Research 2018." Frontiers in Nutrition 8 (2021): Article 669155. https://doi.org/10.3389/fnut.2021.669155
60. Ferdowsy, Faria, Kazi Samsul Alam Rahi, Md. Ismail Jabiullah, and Md. Tarek Habib. "A Machine Learning Approach for Obesity Risk Prediction." Current Research in Behavioral Sciences 2 (2021): 100053. https://doi.org/10.1016/j.crbeha.2021.100053
61. Chatterjee, Ayan, Martin W. Gerdes, and Santiago G. Martinez. "Identification of Risk Factors Associated with Obesity and Overweight—A Machine Learning Overview." Sensors 20, no. 9 (2020): 2734. https://doi.org/10.3390/s20092734
62. DeGregory, K. W., P. Kuiper, T. DeSilvio, J. D. Pleuss, R. Miller, J. W. Roginski, C. B. Fisher, D. Harness, S. Viswanath, S. B. Heymsfield, I. Dungan, and D. M. Thomas. "A Review of Machine Learning in Obesity." Obesity Reviews: An Official Journal of the International Association for the Study of Obesity 19, no. 5 (2018): 668-685. Safaei, Mahmood, Elankovan A. Sundararajan, Maha Driss, Wadii Boulila, and Azrulhizam Shapi'i. "A Systematic Literature Review on Obesity: Understanding the Causes & Consequences of Obesity and Reviewing Various Machine Learning Approaches Used to Predict Obesity." Computers in Biology and Medicine 136 (2021): 104754. https://doi.org/10.1016/j.compbiomed.2021.104754
63. https://doi.org/10.1111/obr.12667