Mathematical Analysis of Statistical Design of Experiment and Machine Learning Methods in Identifying Factors Influencing Obesity

Vesna Knights; Tatjana Blazevska; Gordana Markovic; Jasenka Gajdoš Kljusurić

doi:10.18103/mra.v12i9.5790

3-Oct-5790.pdf

Published Sep 30, 2024

DOI: https://doi.org/10.18103/mra.v12i9.5790

Downloads

Submit your own article

Register as an author to reserve your spot in the next issue of the Medical Research Archives.

Author Registration

Join the Society

The European Society of Medicine is more than a professional association. We are a community. Our members work in countries across the globe, yet are united by a common goal: to promote health and health equity, around the world.

Join Europe’s leading medical society and discover the many advantages of membership, including free article publication.

Membership

Vesna Knights

University "St. Kliment Ohridski" Bitola, Faculty of Technology and Technical Sciences Veles, Dimitar Vlahov bb, 1400 Veles, Republic of North Macedonia.

Tatjana Blazevska

University "St. Kliment Ohridski" Bitola, Faculty of Technology and Technical Sciences Veles, Dimitar Vlahov bb, 1400 Veles, Republic of North Macedonia.

Gordana Markovic

University "St. Kliment Ohridski" Bitola, Faculty of Technology and Technical Sciences Veles, Dimitar Vlahov bb, 1400 Veles, Republic of North Macedonia.

Jasenka Gajdoš Kljusurić

Faculty of Food Technology and Biotechnology, University of Zagreb, Croatia.

Abstract

Introduction: This paper explores a mathematical framework for defining factors influencing obesity by comparing statistical design of experiment and machine learning (ML) approaches.

Methods: A low-calorie program was applied to 100 overweight to morbidly obese patients monitored over 8 visits in 4 months and over. A traditional three-factor experimental design was employed to evaluate the impact of glucose, Alanine aminotransferase (ALT) enzyme, and cholesterol levels on obesity. ML methods (Multiple Linear Regression, Random Forest, Decision Tree Classifier, Gradient Boosting Regressor and XGBoost) were employed to evaluate the impact of glucose, ALT enzyme, cholesterol levels, body mass, blood pressure, and sex on obesity.

Results: The three-factor experiment indicated glucose had the greatest impact on obesity, followed by cholesterol and ALT, particularly significant in females. ML models, with over 90% accuracy and RMSE less than 1.5, corroborated these findings and also highlighted the roles of blood pressure.

Conclusion: Both statistical and ML models aim to understand relationships between variables and predict outcomes, differing in assumptions, flexibility, and interpretability. Statistical methods offer high interpretability and rigorous testing, while ML provides flexibility and robust performance with complex data.

Keywords: Mathematical modeling, Three-factor model, Optimization, Machine learning, Obesity

How to Cite

KNIGHTS, Vesna et al. Mathematical Analysis of Statistical Design of Experiment and Machine Learning Methods in Identifying Factors Influencing Obesity. Medical Research Archives, [S.l.], v. 12, n. 9, sep. 2024. ISSN 2375-1924. Available at: <https://esmed.org/MRA/mra/article/view/5790>. Date accessed: 16 may 2025. doi: https://doi.org/10.18103/mra.v12i9.5790.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 12 No 9 (2024): Vol 12 No 9 (2024): September ISSUE, Issue 9, VOl.12

Section

Research Articles

The Medical Research Archives grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the Medical Research Archives.

References

1. Antony, J. (2014). Design of Experiments for Engineers and Scientists. Elsevier Ltd., 63-85.

2. NIST SEMATECH. e-Handbook of Statistical Methods. https://doi.org/10.18434/M32189.

3. Levine, M. D., Stephan, F. D. (2022). Even You Can Learn Statistics and Analytics: An Easy to Understand Guide to Statistics and Analytics (4th ed.). Pearson FT Press, 211-248.

4. Salsburg, D. (2002). The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. Henry Holt and Company. ISBN 0-8050-7134-2.

5. Jiju, A. (2014). Design of Experiments for Engineers and Scientists (2nd ed.). Elsevier, Amsterdam, Netherlands.

6. Das, K. A., Dewanjee, S. (2018). Optimisation of Extraction Using Mathematical Models and Computation. In: Sarker, D. S., Nahar, L. (Eds.), Computational Phytochemistry. Elsevier, Amsterdam, Netherlands, 75-106.

7. Ait-Amir, B., El Hami, A., Pougnet, P. (2020). Meta-Model Development. In: El Hami, A., Pougnet, P. (Eds.), Embedded Mechatronic Systems 2 (2nd ed.). Elsevier, Amsterdam, Netherlands. https://doi.org/10.1016/B978-1-78548-014-0.50006-2. Accessed 19 December 2022.

8. Antoska Knights, V., & Millaku, J. (2023). Three-factor experimental design as a tool in applied statistics. International Journal of Statistics and Applied Mathematics, 8(1), 46-49. https://doi.org/10.22271/maths.2023.v8.i1a.929

9. Markovikj, G., & Knights, V. (2022). Model of optimization of the sustainable diet indicators. Journal of Hygienic Engineering and Design, 39, 169-175.

10. Sun, Y., Wang, X., Zhang, C., & Zuo, M. (2023). Multiple Regression: Methodology and Applications. Highlights in Science, Engineering and Technology AMMSAC, 49, 542.

11. Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.

12. Iwasaki, M. (2020). Multiple Regression Analysis from Data Science Perspective. In: Multiple Regression Analysis, 131-140. https://doi.org/10.1007/978-981-15-2700-5_8.

13. Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2004). Applied Linear Statistical Models. McGraw-Hill Education.

14. Knights, V., & Prchkovska, M. (2024). From equations to predictions: Understanding the mathematics and machine learning of multiple linear regression. Journal of Mathematical & Computer Applications, 3(2), 1-8. https://doi.org/10.47363/JMCA/2024(3)137

15. Cui, T., Chen, Y., Wang, J., Deng, H., & Huang, Y. (2021). Estimation of Obesity Levels Based on Decision Trees. 2021 International Symposium on Artificial Intelligence and its Application on Media (ISAIAM), 160-165. https://doi.org/10.1109/ISAIAM53259.2021.00041

16. Iparraguirre-Villanueva, O., Mirano-Portilla, L., Gamarra-Mendoza, M., & Robles-Espiritu, W. (2024). Predicting obesity in nutritional patients using decision tree modeling. International Journal of Advanced Computer Science and Applications. Retrieved from https://api.semanticscholar.org/CorpusID:268819010

17. Cui, T., Chen, Y., Wang, J., Deng, H., & Huang, Y. (2021). Estimation of obesity levels based on decision trees. In 2021 International Symposium on Artificial Intelligence and its Application on Media (ISAIAM) (pp. 160-165). Retrieved from https://api.semanticscholar.org/CorpusID:237296330

18. Rodríguez-Pardo, C., Segura, A., Zamorano-León, J. J., Martínez-Santos, C., Martínez, D., Collado-Yurrita, L., ... & López-Farre, A. (2019). Decision tree learning to predict overweight/ obesity based on body mass index and gene polymorphisms. Gene, 699, 88-93. https://doi.org/10.1016/j.gene.2019.03.011

19. Han, S., Williamson, B. D., & Fong, Y. (2021). Improving random forest predictions in small datasets from two-phase sampling designs. BMC Medical Informatics and Decision Making, 21, 322. https://doi.org/10.1186/s12911-021-01688-3

20. Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real-world classification problems? Journal of Machine Learning Research, 15, 3133-3181.

21. Lu, X., & Bengio, Y. (2005). An analysis of the random subspace method for decision forest. In Proceedings of the 22nd International Conference on Machine Learning (ICML) (Vol. 1, pp. 497-504). New York, NY, USA.

22. Liaw, A., & Wiener, M. (2002). Classification and regression by random forest. R News, 2(3), 18-22.

23. Jana, M. (2023). Exploring Machine Learning Models: A Comprehensive Comparison of Logistic Regression, Decision Trees, SVM, Random Forest, and XGBoost. Medium. Available from: https://medium.com/@malli.learnings/exploring-machine-learning-models-a-comprehensive-comparison-of-logistic-regression-decision-38cc12287055

24. Lee, H., Wang, J., & Leblon, B. (2020). Using Linear Regression, Random Forests, and Support Vector Machine with Unmanned Aerial Vehicle Multispectral Images to Predict Canopy Nitrogen Weight in Corn. Remote Sensing, 12(13), 2071. https://doi.org/10.3390/rs12132071

25. Miotto, R., Wang, F., Wang, S., Jiang, X., & Dudley, J. T. (2018). Deep learning for healthcare: review, opportunities and challenges. Briefings in Bioinformatics, 19(6), 1236-1246. https://doi.org/10.1093/bib/bbx044

26. Maharana, A., & Nsoesie, E. O. (2018). Use of deep learning to examine the association of the built environment with prevalence of neighborhood adult obesity. JAMA Network Open, 1(4), e181535. https://doi.org/10.1001/jamanetworkopen.2018.1535

27. U, S., K. PT, & K, S. (2021). Computer aided diagnosis of obesity based on thermal imaging using various convolutional neural networks. Biomedical Signal Processing and Control, 63, 102233. https://doi.org/10.1016/j.bspc.2020.102233

28. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer.

29. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.

30. Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.

31. Knights, V., Kolak, M., Markovikj, G., & Gajdoš Kljusurić, J. (2023). Modeling and optimization with artificial intelligence in nutrition. Applied Sciences, 13(13), 7835.

32. Knights, V., Gavriloska, E. D., et al. (2024). Machine Learning Techniques for Modelling and Predicting the Influence of Kefir in a Low-Protein Diet on Kidney Function. Medical Research Archives, 12(7). https://doi.org/10.18103/mra.v12i7.0000

33. An, R., Shen, J., & Xiao, Y. (2022). Applications of Artificial Intelligence to Obesity Research: Scoping Review of Methodologies. Journal of Medical Internet Research, 24, e40589. https://doi.org/10.2196/40589

34. Markovikj, G., Knights, V., & Kljusurić, J. G. (2023). Ketogenic Diet Applied in Weight Reduction of Overweight and Obese Individuals with Progress Prediction by Use of the Modified Wishnofsky Equation. Nutrients, 15, 927. https://doi.org/10.3390/nu15040927

35. Markovikj, G., Knights, V., & Gajdoš Kljusurić, J. (2023). Body Weight Loss Efficiency in Overweight and Obese Adults in the Ketogenic Reduction Diet Program—Case Study. Applied Sciences, 13, 10704. https://doi.org/10.3390/app131910704

36. Markovikj, G., Knights, V., Nikolovska Nedelkovska, D., & Damjanovski, D. (2020). Statistical analysis of results in patients applying the sustainable diet indicators. Journal of Hygienic Engineering and Design, 30, 35–39. https://doi.org/10.3390/app131910704

37. Westman, E. (2013). A Low Carbohydrate, Ketogenic Diet Manual: No Sugar, No Starch Diet. CreateSpace Independent Publishing Platform, Scotts Valley, USA.

38. Moore, J., & Westman, C. M. D. (2014). Keto Clarity. Retrieved from https://www.scribd.com/document/412124479/Keto-Clarity-by-Jimmy-Moore-and-Eric-Westman-MD. Accessed 8 December 2022.

39. Greene, W. H. (2018). Econometric Analysis. Pearson Education. Available at: https://www.ctanujit.org/uploads/2/5/3/9/25393293/_econometric_analysis_by_greence.pdf

40. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.

41. Montgomery, D. C. (2013). Design and Analysis of Experiments (8th ed.). John Wiley & Sons.

42. Anderson, M., & Whitcomb, P. (2007). DOE Simplified: Practical Tools for Effective Experimentation (2nd ed.). Retrieved from https://cdnm.statease.com/pubs/doesimp2excerpt--chap3.pdf

43. IMCF Designs. (2013). Experimental Design: Multiple Independent Variables. Retrieved from https://uca.edu/psychology/files/2013/08/Ch13-Experimental-Design_Multiple-Independent-Variables.pdf

44. Müller, A. C., & Guido, S. (2016). Introduction to Machine Learning with Python: A Guide for Data Scientists. O’Reilly Media. Available at https://www.nrigroupindia.com/e-book/Introduction%20to%20Machine%20Learning%20with%20Python%20(%20PDFDrive.com%20)-min.pdf

45. Sheskin, D. J. (2000). Handbook of Parametric and Nonparametric Statistical Procedures (2nd ed.). Chapman & Hall/CRC.

46. Kiemele, M. J., Schmidt, S. R., & Berdine, R. J. (1997). Basic Statistics: Tools for Continuous Improvement (4th ed.). Air Academy Press.

47. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning: With Applications in R. Springer.

48. Dean, A., Morris, M., Stufken, J., & Bingham, D. (Eds.). (2015). Handbook of Design and Analysis of Experiments. CRC Press.

49. Dean, A. M., & Voss, D. T. (1999). Design and Analysis of Experiments. Springer.

50. Lane, D. M., Scott, D., Hebl, M., Guerra, R., Osherson, D., & Zimmer, H. (2024). Introduction to Statistics (Online ed.). Rice University; University of Houston, Downtown Campus. Available at: Online _Statistics_Education.pdf (onlinestatbook.com)

51. Leonardo, A. (2024). The Practically Cheating Statistics Handbook (5th ed.). Practically Cheating. Available at: Tables - Statistics How To

52. Thakur, A. (2020). Approaching (Almost) Any Machine Learning Problem. Independently published.

53. Burkov, A. (2019). The Hundred-Page Machine Learning Book. Andriy Burkov.

54. Boehmke, B., & Greenwell, B. (2019). Hands-On Machine Learning with R. CRC Press.

55. Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for Machine Learning. Cambridge University Press. https://mml-book.com

56. Smola, A., & Vishwanathan, S. V. N. (2008). Introduction to Machine Learning. Cambridge University Press. Available at https://alex.smola.org/drafts/thebook.pdf

57. Mitchell, T. M. (1997). Machine Learning. McGraw-Hill Science/Engineering/Math. Available at https://www.cin.ufpe.br/~cavmj/Machine%20-%20Learning%20-%20Tom%20Mitchell.pdf

58. Yao, X., Hu, K., & Wang, Z. et al. (2024). Liver indicators affecting the relationship between BMI and hypertension in type 2 diabetes: a mediation analysis. Diabetology & Metabolic Syndrome, 16, 19. https://doi.org/10.1186/s13098-023-01254-z

59. Thamrin, Sri Astuti, Dian Sidik Arsyad, Hedi Kuswanto, Armin Lawi, and Sudirman Nasir. "Predicting Obesity in Adults Using Machine Learning Techniques: An Analysis of Indonesian Basic Health Research 2018." Frontiers in Nutrition 8 (2021): Article 669155. https://doi.org/10.3389/fnut.2021.669155

60. Ferdowsy, Faria, Kazi Samsul Alam Rahi, Md. Ismail Jabiullah, and Md. Tarek Habib. "A Machine Learning Approach for Obesity Risk Prediction." Current Research in Behavioral Sciences 2 (2021): 100053. https://doi.org/10.1016/j.crbeha.2021.100053

61. Chatterjee, Ayan, Martin W. Gerdes, and Santiago G. Martinez. "Identification of Risk Factors Associated with Obesity and Overweight—A Machine Learning Overview." Sensors 20, no. 9 (2020): 2734. https://doi.org/10.3390/s20092734

62. DeGregory, K. W., P. Kuiper, T. DeSilvio, J. D. Pleuss, R. Miller, J. W. Roginski, C. B. Fisher, D. Harness, S. Viswanath, S. B. Heymsfield, I. Dungan, and D. M. Thomas. "A Review of Machine Learning in Obesity." Obesity Reviews: An Official Journal of the International Association for the Study of Obesity 19, no. 5 (2018): 668-685. Safaei, Mahmood, Elankovan A. Sundararajan, Maha Driss, Wadii Boulila, and Azrulhizam Shapi'i. "A Systematic Literature Review on Obesity: Understanding the Causes & Consequences of Obesity and Reviewing Various Machine Learning Approaches Used to Predict Obesity." Computers in Biology and Medicine 136 (2021): 104754. https://doi.org/10.1016/j.compbiomed.2021.104754

63. https://doi.org/10.1111/obr.12667

Society Homepage

About

Public Health Policy

Contact

Mathematical Analysis of Statistical Design of Experiment and Machine Learning Methods in Identifying Factors Influencing Obesity

Downloads

Submit your own article

Join the Society

Abstract

References

Free manuscript evaluation

Society Homepage About Public Health Policy Contact

Article Sidebar

Downloads

Submit your own article

Join the Society

Main Article Content

Abstract

Article Details

References

Society Homepage

About

Public Health Policy

Contact