Mathematical Analysis of Statistical Design of Experiment and Machine Learning Methods in Identifying Factors Influencing Obesity

Main Article Content

Vesna Knights Tatjana Blazevska Gordana Markovic Jasenka Gajdoš Kljusurić

Abstract

Introduction: This paper explores a mathematical framework for defining factors influencing obesity by comparing statistical design of experiment and machine learning (ML) approaches.


Methods: A low-calorie program was applied to 100 overweight to morbidly obese patients monitored over 8 visits in 4 months and over. A traditional three-factor experimental design was employed to evaluate the impact of glucose, Alanine aminotransferase (ALT) enzyme, and cholesterol levels on obesity. ML methods (Multiple Linear Regression, Random Forest, Decision Tree Classifier, Gradient Boosting Regressor and XGBoost) were employed to evaluate the impact of glucose, ALT enzyme, cholesterol levels, body mass, blood pressure, and sex on obesity.


Results: The three-factor experiment indicated glucose had the greatest impact on obesity, followed by cholesterol and ALT, particularly significant in females. ML models, with over 90% accuracy and RMSE less than 1.5, corroborated these findings and also highlighted the roles of blood pressure.


Conclusion: Both statistical and ML models aim to understand relationships between variables and predict outcomes, differing in assumptions, flexibility, and interpretability. Statistical methods offer high interpretability and rigorous testing, while ML provides flexibility and robust performance with complex data.

Keywords: Mathematical modeling, Three-factor model, Optimization, Machine learning, Obesity

Article Details

How to Cite
KNIGHTS, Vesna et al. Mathematical Analysis of Statistical Design of Experiment and Machine Learning Methods in Identifying Factors Influencing Obesity. Medical Research Archives, [S.l.], v. 12, n. 9, sep. 2024. ISSN 2375-1924. Available at: <https://esmed.org/MRA/mra/article/view/5790>. Date accessed: 04 oct. 2024. doi: https://doi.org/10.18103/mra.v12i9.5790.
Section
Research Articles

References

1. Antony, J. (2014). Design of Experiments for Engineers and Scientists. Elsevier Ltd., 63-85.

2. NIST SEMATECH. e-Handbook of Statistical Methods. https://doi.org/10.18434/M32189.

3. Levine, M. D., Stephan, F. D. (2022). Even You Can Learn Statistics and Analytics: An Easy to Understand Guide to Statistics and Analytics (4th ed.). Pearson FT Press, 211-248.

4. Salsburg, D. (2002). The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. Henry Holt and Company. ISBN 0-8050-7134-2.

5. Jiju, A. (2014). Design of Experiments for Engineers and Scientists (2nd ed.). Elsevier, Amsterdam, Netherlands.

6. Das, K. A., Dewanjee, S. (2018). Optimisation of Extraction Using Mathematical Models and Computation. In: Sarker, D. S., Nahar, L. (Eds.), Computational Phytochemistry. Elsevier, Amsterdam, Netherlands, 75-106.

7. Ait-Amir, B., El Hami, A., Pougnet, P. (2020). Meta-Model Development. In: El Hami, A., Pougnet, P. (Eds.), Embedded Mechatronic Systems 2 (2nd ed.). Elsevier, Amsterdam, Netherlands. https://doi.org/10.1016/B978-1-78548-014-0.50006-2. Accessed 19 December 2022.

8. Antoska Knights, V., & Millaku, J. (2023). Three-factor experimental design as a tool in applied statistics. International Journal of Statistics and Applied Mathematics, 8(1), 46-49. https://doi.org/10.22271/maths.2023.v8.i1a.929

9. Markovikj, G., & Knights, V. (2022). Model of optimization of the sustainable diet indicators. Journal of Hygienic Engineering and Design, 39, 169-175.

10. Sun, Y., Wang, X., Zhang, C., & Zuo, M. (2023). Multiple Regression: Methodology and Applications. Highlights in Science, Engineering and Technology AMMSAC, 49, 542.

11. Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.

12. Iwasaki, M. (2020). Multiple Regression Analysis from Data Science Perspective. In: Multiple Regression Analysis, 131-140. https://doi.org/10.1007/978-981-15-2700-5_8.

13. Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2004). Applied Linear Statistical Models. McGraw-Hill Education.

14. Knights, V., & Prchkovska, M. (2024). From equations to predictions: Understanding the mathematics and machine learning of multiple linear regression. Journal of Mathematical & Computer Applications, 3(2), 1-8. https://doi.org/10.47363/JMCA/2024(3)137

15. Cui, T., Chen, Y., Wang, J., Deng, H., & Huang, Y. (2021). Estimation of Obesity Levels Based on Decision Trees. 2021 International Symposium on Artificial Intelligence and its Application on Media (ISAIAM), 160-165. https://doi.org/10.1109/ISAIAM53259.2021.00041

16. Iparraguirre-Villanueva, O., Mirano-Portilla, L., Gamarra-Mendoza, M., & Robles-Espiritu, W. (2024). Predicting obesity in nutritional patients using decision tree modeling. International Journal of Advanced Computer Science and Applications. Retrieved from https://api.semanticscholar.org/CorpusID:268819010

17. Cui, T., Chen, Y., Wang, J., Deng, H., & Huang, Y. (2021). Estimation of obesity levels based on decision trees. In 2021 International Symposium on Artificial Intelligence and its Application on Media (ISAIAM) (pp. 160-165). Retrieved from https://api.semanticscholar.org/CorpusID:237296330

18. Rodríguez-Pardo, C., Segura, A., Zamorano-León, J. J., Martínez-Santos, C., Martínez, D., Collado-Yurrita, L., ... & López-Farre, A. (2019). Decision tree learning to predict overweight/ obesity based on body mass index and gene polymorphisms. Gene, 699, 88-93. https://doi.org/10.1016/j.gene.2019.03.011

19. Han, S., Williamson, B. D., & Fong, Y. (2021). Improving random forest predictions in small datasets from two-phase sampling designs. BMC Medical Informatics and Decision Making, 21, 322. https://doi.org/10.1186/s12911-021-01688-3

20. Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real-world classification problems? Journal of Machine Learning Research, 15, 3133-3181.

21. Lu, X., & Bengio, Y. (2005). An analysis of the random subspace method for decision forest. In Proceedings of the 22nd International Conference on Machine Learning (ICML) (Vol. 1, pp. 497-504). New York, NY, USA.

22. Liaw, A., & Wiener, M. (2002). Classification and regression by random forest. R News, 2(3), 18-22.

23. Jana, M. (2023). Exploring Machine Learning Models: A Comprehensive Comparison of Logistic Regression, Decision Trees, SVM, Random Forest, and XGBoost. Medium. Available from: https://medium.com/@malli.learnings/exploring-machine-learning-models-a-comprehensive-comparison-of-logistic-regression-decision-38cc12287055

24. Lee, H., Wang, J., & Leblon, B. (2020). Using Linear Regression, Random Forests, and Support Vector Machine with Unmanned Aerial Vehicle Multispectral Images to Predict Canopy Nitrogen Weight in Corn. Remote Sensing, 12(13), 2071. https://doi.org/10.3390/rs12132071

25. Miotto, R., Wang, F., Wang, S., Jiang, X., & Dudley, J. T. (2018). Deep learning for healthcare: review, opportunities and challenges. Briefings in Bioinformatics, 19(6), 1236-1246. https://doi.org/10.1093/bib/bbx044

26. Maharana, A., & Nsoesie, E. O. (2018). Use of deep learning to examine the association of the built environment with prevalence of neighborhood adult obesity. JAMA Network Open, 1(4), e181535. https://doi.org/10.1001/jamanetworkopen.2018.1535

27. U, S., K. PT, & K, S. (2021). Computer aided diagnosis of obesity based on thermal imaging using various convolutional neural networks. Biomedical Signal Processing and Control, 63, 102233. https://doi.org/10.1016/j.bspc.2020.102233

28. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer.

29. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.

30. Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.

31. Knights, V., Kolak, M., Markovikj, G., & Gajdoš Kljusurić, J. (2023). Modeling and optimization with artificial intelligence in nutrition. Applied Sciences, 13(13), 7835.

32. Knights, V., Gavriloska, E. D., et al. (2024). Machine Learning Techniques for Modelling and Predicting the Influence of Kefir in a Low-Protein Diet on Kidney Function. Medical Research Archives, 12(7). https://doi.org/10.18103/mra.v12i7.0000

33. An, R., Shen, J., & Xiao, Y. (2022). Applications of Artificial Intelligence to Obesity Research: Scoping Review of Methodologies. Journal of Medical Internet Research, 24, e40589. https://doi.org/10.2196/40589

34. Markovikj, G., Knights, V., & Kljusurić, J. G. (2023). Ketogenic Diet Applied in Weight Reduction of Overweight and Obese Individuals with Progress Prediction by Use of the Modified Wishnofsky Equation. Nutrients, 15, 927. https://doi.org/10.3390/nu15040927

35. Markovikj, G., Knights, V., & Gajdoš Kljusurić, J. (2023). Body Weight Loss Efficiency in Overweight and Obese Adults in the Ketogenic Reduction Diet Program—Case Study. Applied Sciences, 13, 10704. https://doi.org/10.3390/app131910704

36. Markovikj, G., Knights, V., Nikolovska Nedelkovska, D., & Damjanovski, D. (2020). Statistical analysis of results in patients applying the sustainable diet indicators. Journal of Hygienic Engineering and Design, 30, 35–39. https://doi.org/10.3390/app131910704

37. Westman, E. (2013). A Low Carbohydrate, Ketogenic Diet Manual: No Sugar, No Starch Diet. CreateSpace Independent Publishing Platform, Scotts Valley, USA.

38. Moore, J., & Westman, C. M. D. (2014). Keto Clarity. Retrieved from https://www.scribd.com/document/412124479/Keto-Clarity-by-Jimmy-Moore-and-Eric-Westman-MD. Accessed 8 December 2022.

39. Greene, W. H. (2018). Econometric Analysis. Pearson Education. Available at: https://www.ctanujit.org/uploads/2/5/3/9/25393293/_econometric_analysis_by_greence.pdf

40. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.

41. Montgomery, D. C. (2013). Design and Analysis of Experiments (8th ed.). John Wiley & Sons.

42. Anderson, M., & Whitcomb, P. (2007). DOE Simplified: Practical Tools for Effective Experimentation (2nd ed.). Retrieved from https://cdnm.statease.com/pubs/doesimp2excerpt--chap3.pdf

43. IMCF Designs. (2013). Experimental Design: Multiple Independent Variables. Retrieved from https://uca.edu/psychology/files/2013/08/Ch13-Experimental-Design_Multiple-Independent-Variables.pdf

44. Müller, A. C., & Guido, S. (2016). Introduction to Machine Learning with Python: A Guide for Data Scientists. O’Reilly Media. Available at https://www.nrigroupindia.com/e-book/Introduction%20to%20Machine%20Learning%20with%20Python%20(%20PDFDrive.com%20)-min.pdf

45. Sheskin, D. J. (2000). Handbook of Parametric and Nonparametric Statistical Procedures (2nd ed.). Chapman & Hall/CRC.

46. Kiemele, M. J., Schmidt, S. R., & Berdine, R. J. (1997). Basic Statistics: Tools for Continuous Improvement (4th ed.). Air Academy Press.

47. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning: With Applications in R. Springer.

48. Dean, A., Morris, M., Stufken, J., & Bingham, D. (Eds.). (2015). Handbook of Design and Analysis of Experiments. CRC Press.

49. Dean, A. M., & Voss, D. T. (1999). Design and Analysis of Experiments. Springer.

50. Lane, D. M., Scott, D., Hebl, M., Guerra, R., Osherson, D., & Zimmer, H. (2024). Introduction to Statistics (Online ed.). Rice University; University of Houston, Downtown Campus. Available at: Online _Statistics_Education.pdf (onlinestatbook.com)

51. Leonardo, A. (2024). The Practically Cheating Statistics Handbook (5th ed.). Practically Cheating. Available at: Tables - Statistics How To

52. Thakur, A. (2020). Approaching (Almost) Any Machine Learning Problem. Independently published.

53. Burkov, A. (2019). The Hundred-Page Machine Learning Book. Andriy Burkov.

54. Boehmke, B., & Greenwell, B. (2019). Hands-On Machine Learning with R. CRC Press.

55. Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for Machine Learning. Cambridge University Press. https://mml-book.com

56. Smola, A., & Vishwanathan, S. V. N. (2008). Introduction to Machine Learning. Cambridge University Press. Available at https://alex.smola.org/drafts/thebook.pdf

57. Mitchell, T. M. (1997). Machine Learning. McGraw-Hill Science/Engineering/Math. Available at https://www.cin.ufpe.br/~cavmj/Machine%20-%20Learning%20-%20Tom%20Mitchell.pdf

58. Yao, X., Hu, K., & Wang, Z. et al. (2024). Liver indicators affecting the relationship between BMI and hypertension in type 2 diabetes: a mediation analysis. Diabetology & Metabolic Syndrome, 16, 19. https://doi.org/10.1186/s13098-023-01254-z

59. Thamrin, Sri Astuti, Dian Sidik Arsyad, Hedi Kuswanto, Armin Lawi, and Sudirman Nasir. "Predicting Obesity in Adults Using Machine Learning Techniques: An Analysis of Indonesian Basic Health Research 2018." Frontiers in Nutrition 8 (2021): Article 669155. https://doi.org/10.3389/fnut.2021.669155

60. Ferdowsy, Faria, Kazi Samsul Alam Rahi, Md. Ismail Jabiullah, and Md. Tarek Habib. "A Machine Learning Approach for Obesity Risk Prediction." Current Research in Behavioral Sciences 2 (2021): 100053. https://doi.org/10.1016/j.crbeha.2021.100053

61. Chatterjee, Ayan, Martin W. Gerdes, and Santiago G. Martinez. "Identification of Risk Factors Associated with Obesity and Overweight—A Machine Learning Overview." Sensors 20, no. 9 (2020): 2734. https://doi.org/10.3390/s20092734

62. DeGregory, K. W., P. Kuiper, T. DeSilvio, J. D. Pleuss, R. Miller, J. W. Roginski, C. B. Fisher, D. Harness, S. Viswanath, S. B. Heymsfield, I. Dungan, and D. M. Thomas. "A Review of Machine Learning in Obesity." Obesity Reviews: An Official Journal of the International Association for the Study of Obesity 19, no. 5 (2018): 668-685. Safaei, Mahmood, Elankovan A. Sundararajan, Maha Driss, Wadii Boulila, and Azrulhizam Shapi'i. "A Systematic Literature Review on Obesity: Understanding the Causes & Consequences of Obesity and Reviewing Various Machine Learning Approaches Used to Predict Obesity." Computers in Biology and Medicine 136 (2021): 104754. https://doi.org/10.1016/j.compbiomed.2021.104754

63. https://doi.org/10.1111/obr.12667