Main Article Content
We briefly review our recently published approach to mining digenic genotype patterns, which consist of two genotypes each originating in a different DNA variant. We do this for a genetic case-control study by evaluating all possible pairs of genotypes, distributing the workload over numerous CPUs (threads) in a high-performance computing environment and apply our methods to two known datasets, age-related macular degeneration (AMD) and Parkinson Disease (PD). Based on a list of (e.g., 100,000) genotype pairs with largest genotype pair frequency differences between cases and controls, we determine the number Nu of unique variants occurring in this list. For each unique variant, we find the number of genotype pairs it participates in, which identifies a set of variants “connected” with the given unique variant. Among the total of variants “connected” with all unique variants, only a subset of variants is unique. The ratio of all connected variants divided by that subset of variants is a measure for the overall density or connectedness of variants interacting with each other. We find that variants for the AMD data are much more interconnected than those for PD, at least based on the 100,000 genotype pairs with largest chi-square we investigated. Further, for each of the Nu unique variants, we use the number of variants connected with it as a test statistic, weighted by the inverse of the rank at which the unique variant first occurred in the original list of genotype patterns. This weighing scheme ties the number of connections to the genetics of the trait and allows us to obtain, for each of the Nu unique variants, an empirical significance level by permuting ranks. We find 12 and 8 significant, highly connected variants for AMD and PD, respectively, some of which have previously been identified by other machine learning methods, thus providing credence to our approach. Among the 100,000 genotype pairs investigated for each of AMD and PD, significant variants showed connections with up to 7,093 and 3,777 other variants, respectively. Our approach has been implemented in a freely available piece of software, the Digenic Network Test. Thus, our statistical genetics method can provide important information on the genetic architecture of polygenic traits.
The Medical Research Archives grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the Medical Research Archives.
2. Sturtevant AH. The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. J Exp Zool. 1913;14:43-59.
3. Ott J, Wang J, Leal SM. Genetic linkage analysis in the age of whole-genome sequencing. Nat Rev Genet. May 2015;16(5):275-84. doi:10.1038/nrg3908
4. Uffelmann E, Huang QQ, Munung NS, et al. Genome-wide association studies. Nature Reviews Methods Primers. 2021/08/26 2021;1(1):59. doi:10.1038/s43586-021-00056-9
5. Ott J. Analysis of human genetic linkage. 3rd ed. Johns Hopkins University Press; 1999:xxiii, 382.
6. Schork NJ, Boehnke M, Terwilliger JD, Ott J. Two-trait-locus linkage analysis: a powerful strategy for mapping complex genetic traits. Am J Hum Genet. 1993;53(5):1127-36.
7. Okazaki A, Horpaopan S, Zhang Q, Randesi M, Ott J. Genotype pattern mining for pairs of interacting variants underlying digenic traits. Genes. 2021;12(8):1160. doi:10.3390/genes12081160
8. Okazaki A, Ott J. Machine learning approaches to explore digenic inheritance. Trends Genet. Oct 2022;38(10):1013-1018. doi:10.1016/j.tig.2022.04.009
9. Moore JH, Hahn LW. A cellular automata approach to detecting interactions among single-nucleotide polymorphisms in complex multifactorial diseases. Pac Symp Biocomput. 2002:53-64.
10. Lucek P, Hanke J, Reich J, Solla SA, Ott J. Multi-locus nonparametric linkage analysis of complex trait loci with neural networks. Hum Hered. 1998;48(5):275-84. doi:10.1159/000022816
11. Zhang Q, Bhatia M, Park T, Ott J. A multi-threaded approach to genotype pattern mining for detecting digenic disease genes. Front Genet. 2023;14:1222517. doi:10.3389/fgene.2023.1222517
12. Chang CC. Data Management and Summary Statistics with PLINK. Methods Mol Biol. 2020;2090:49-65. doi:10.1007/978-1-0716-0199-0_3
13. Fung HC, Scholz S, Matarin M, et al. Genome-wide genotyping in Parkinson's disease and neurologically normal controls: first stage analysis and public release of data. Lancet Neurol. 2006;5(11):911-6. doi:10.1016/S1474-4422(06)70578-6
14. Klein RJ, Zeiss C, Chew EY, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308(5720):385-9. doi:10.1126/science.1109557
15. Agresti A. An introduction to categorical data analysis. Third edition. ed. Wiley series in probability and statistics. Wiley; 2019:xiii, 375 pages.
16. Ott J, Park T. Overview of frequent pattern mining. Genomics Inform. Dec 2022;20(4):e39. doi:10.5808/gi.22074
17. Irwin J. Tests of significance for differences between percentages based on small numbers. Metron. 1935;12(2):84-94.
18. Karlsson Linnér R, Mallard TT, Barr PB, et al. Multivariate analysis of 1.5 million people identifies genetic associations with traits related to self-regulation and addiction. Nature Neuroscience. 2021/10/01 2021;24(10): 1367-1376. doi:10.1038/s41593-021-00908-3
19. Barr PB, Driver MN, Kuo SIC, et al. Clinical, environmental, and genetic risk factors for substance use disorders: characterizing combined effects across multiple cohorts. Molecular Psychiatry. 2022/11/01 2022;27 (11):4633-4641. doi:10.1038/s41380-022-01801-6
20. Albiñana C, Zhu Z, Schork AJ, et al. Multi-PGS enhances polygenic prediction by combining 937 polygenic scores. Nature communications. 2023/08/05 2023;14(1):4702. doi:10.1038/s41467-023-40330-w
21. Tuo S, Zhang J, Yuan X, Zhang Y, Liu Z. FHSA-SED: Two-locus model detection for genome-wide association study with Harmony search algorithm. PLoS One. 2016;11(3):e0150669. doi:10.1371/journal.pone.0150669
22. Tuo S, Zhang J, Yuan X, He Z, Liu Y, Liu Z. Niche harmony search algorithm for detecting complex disease associated high-order SNP combinations. Scientific reports. 2017/09/14 2017;7(1):11529. doi:10.1038/s41598-017-11064-9
23. Yin Y, Guan B, Zhao Y, Li Y. SAMA: A Fast Self-Adaptive Memetic Algorithm for Detecting SNP-SNP Interactions Associated with Disease. Biomed Res Int. 2020;2020:5610658. doi:10.1155/2020/5610658
24. Vázquez-Vélez GE, Zoghbi HY. Parkinson's Disease Genetics and Pathophysiology. Annual Review of Neuroscience. 2021;44(1):87-108. doi:10.1146/annurev-neuro-100720-034518
25. Carlisle SM, Qin H, Hendrickson RC, et al. Sex-based differences in the activation of peripheral blood monocytes in early Parkinson disease. npj Parkinson's Disease. 2021/04/13 2021;7(1):36. doi:10.1038/s41531-021-00180-z
26. Wan X, Yang C, Yang Q, Xue H, Tang NL, Yu W. MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study. BMC Bioinformatics. Jan 9 2009;10:13. doi:10.1186/1471-2105-10-13
27. Rikos D, Siokas V, Burykina TI, Drakoulis N, Dardiotis E, Zintzaras E. Replication of chromosomal loci involved in Parkinson's disease: A quantitative synthesis of GWAS. Toxicol Rep. 2021;8:1762-1768. doi:10.1016/j.toxrep.2021.10.008
28. Huo Y-X, Huang L, Zhang D-F, et al. Identification of SLC25A37 as a major depressive disorder risk gene. Journal of Psychiatric Research. 2016/12/01/ 2016;83: 168-175. doi:10.1016/j.jpsychires.2016.09.011
29. Tran AA, De Smet M, Grant GD, Khoo TK, Pountney DL. Investigating the Convergent Mechanisms between Major Depressive Disorder and Parkinson’s Disease. Complex Psychiatry. 2020;6(3-4):47-61. doi:10.1159/000512657