Gaussian Graphical Model Estimations in Multivariate Linear Regression: A method and applications in omics studies
Main Article Content
Abstract
Introduction: Regression models for high-dimensional multivariate data curated from high throughput biological assays in omics, brain networks, medical imaging, and psychometric instruments contain network features. Multivariate linear regression is a standard model that fits these data as response variables and the participant characteristics as explanatory variables. More often, the number of variates of the response variables is larger than the number of observations ( ). To solve these problems, a structured covariance model is necessary to maintain the network feature of the response data, and sparsity induction will be advancing to reduce the number of unknown parameters in the large variance-covariance matrix.
Method: This study investigated an approach to solving multivariate linear regression for multivariate-normal distributed response variables using a sparsity-induced latent precision matrix. The multivariate linear regression coefficients were derived from an algorithm that estimated the precision matrix as a plug-in parameter using different Gaussian Graphical Models. The developed Bioconductor tool “sparsenetgls” based on this algorithm was applied to case studies of real omics datasets. Data simulations were also used to compare different Gaussian Graphical Models estimation methods in multivariate linear regression.
Results: The GGM multivariate linear regression (GGM-MLS) advances the multivariate regression. In the scenario when the number of observations is smaller than the number of response variates ( ), GGM-MLS tackles this challenge using sparsity induction in the covariance matrix. Analytical proof suggests that the estimation of the response variable's precision matrix and the regression coefficient of GGM-MLS are two independent processes. Simulation studies and case studies also consistently suggested that the regression coefficient estimates of GGM-MLS are similar to the estimates using linear mixed regression with only the variance terms in the covariance matrix. Furthermore, GGM-MLS method reduces the variance (standard errors) of the regression coefficients in both and scenarios.
Article Details
The Medical Research Archives grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the Medical Research Archives.
References
2 Cai M, Chen J, Hua C, Wen G, Fu R. EEG emotion recognition using EEG-SWTNS neural network through EEG spectral image. Information Sciences. 2024/10/01/ 2024;680:121198. doi:https://doi.org/10.1016/j.ins.2024.121198
3 Royer J, Rodríguez-Cruces R, Tavakol S, et al. An Open MRI Dataset For Multiscale Neuroscience. Scientific Data. 2022/09/15 2022;9(1):569. doi:10.1038/s41597-022-01682-y
4 Borsboom D, Deserno MK, Rhemtulla M, et al. Network analysis of multivariate data in psychological science. Nature Reviews Methods Primers. 2021/08/19 2021;1(1):58. doi:10.1038/s43586-021-00055-w
5 Van der Vaart AW. M- and Z-Estimators. Asymptotic Statistics. Cambridge University Press; 1998.
6 Goldstein H. Multilevel statistics model John Wiley & Sons; 2010.
7 Zeng IS, Lumley T, Ruggierol K, Middleditch M. A Bayesian approach to multivariate and multilevel modelling with non-random missingness for hierarchical clinical proteomics data. 2017.
8 Zeng IS. Topics in Study Design and Analysis for Multistage Clinical Proteomics Studies. In: Jung K, ed. Statistical Analysis in Proteomics. Springer; 2016.
9 Pinheiro JC, Bates DM. Mixed-effectS models in S and S-PLUS Statistics and Computing. Springer; 2007.
10 Miller RG. The Jacknife – a review. Biometrika. 1974;61:1-15.
11 Yuan M, Lin Y. Model Selection and Estimation in the Gaussian Graphical Model. Biometrika. Mar 2007;94(1.):19-35.
12 Friedman J, Hastie T, Simon N, Tibshiran N. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal oF Statistical Software. 2010;33(1)
13 Friedman J, Hastie T, Tibshirani R. Applications of the lasso and grouped lasso to the estimation of sparse graphical models. 2010.
14 Meinshausen N, Buhlmann P. High-dimensional graphs and variable selection with the lasso. The annals of statistics. 2006;34(3):1436-1462.
15 Peng J, Wang P, Zhou N, Zhu J. Partial Correlation Estimation by Joint Sparse Regression Models. Journal of the American Statistical Association. 2009;104(486):735-746.
16 Cai T, Li H, Liu W, Xie J. Covariate-adjusted precision matrix estimation with an application in genetical genomics Biometrika. 2013;100:139-156.
17 Zhang J, Li Y. High Dimensional Gaussian Graphical Regression Models with Covariates. Journal of the American Statistical Association. 2022;0(0)
18 Devijver E, Gallopin M. Block-Diagonal Covariance Selection for High-Dimensional Gaussian Graphical Models. Journal of the American Statistical Association. 2018;113(521):306-314.
19 Bondy JA, Murty USR. Directed graphs. Graph theory with applications. Elsevier Science Publishing Co. Inc 1976.
20 Zhao T, Liu H. The huge Package for High-dimensional Undirected Graph estimation in R. Journal of Machine Learning Research. 2012;13: 1059-1062.
21 Hurley N, Rickard S. Comparing measures of sparsity. IEEE Transactions on Information Theory,. 2009;55(10):4723-4741.
22 Mertins P, Mani DR, Ruggles KV, et al. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature. 2016;534:55-73.
23 National Library of Medicine.
24 Seillier M, Peuget S, Gayet O, et al. TP53INP1, a tumor suppressor, interacts with LC3 and ATG8-family proteins through the LC3-interacting region (LIR) and promotes autophagy-dependent cell death. Cell Death & Differentiation. 2012/09/01 2012; 19(9):1525-1535. doi:10.1038/cdd.2012.30