HLA-genotype-based Predictive Diagnosis of T-cell Responses to SARS-CoV-2 Infection Powered by Machine Learning
Main Article Content
Abstract
Background: The COVID-19 pandemic has necessitated the development of efficient diagnostic tools to predict T-cell responses, which are crucial for viral clearance and protection against reinfection. Current diagnostic tests lack the ability to predict the epitope repertoire of an individual that induces T-cell responses.
Methods: We developed VERDI, a new machine learning-based diagnostic tool that leverages the sequence data of all the six HLA class I alleles of an individual to rank all putative epitopes based on their potential to induce T-cell responses. VERDI was trained on a comprehensive clinical dataset of 920 SARS-CoV-2 epitopes and validated using an independent dataset collected for the FDA-approved T-detect COVID test. We compared VERDI's performance with existing HLA-allele-based models through statistical analyses.
Results: Our findings reveal that VERDI's top-ranked epitopes accurately represent the individual's epitope repertoire that participates in T-cell responses. VERDI outperformed current models, improving T-cell response prediction recall by threefold and precision by eightfold. It exhibited exceptional diagnostic accuracy, precision, and recall in predicting the potency of the top 20 epitopes. Despite experimental limitations that allow testing of only 1% of putative epitopes, VERDI accurately predicted 30% of these, implying a potentially higher accuracy if broader testing were feasible. Notably, the mean potency of the top-ranked epitopes predicted by VERDI, which reflects the strength of an individual's SARS-CoV-2-specific T-cell responses, exhibited a Gaussian distribution.
Conclusions: VERDI is the first diagnostic tool that uses the complete HLA genotype data to predict the breadth and strength of an individual's T-cell responses to SARS-CoV-2 infection. Its ability to accurately identify the potency of epitopes involved in individual T-cell responses and its superior performance compared to the state-of-the-art make it a new resource for personalized vaccine design and disease management.
Article Details
The Medical Research Archives grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the Medical Research Archives.
References
2. Swadling L, Diniz MO, Schmidt NM, et al. Pre-existing polymerase-specific T cells expand in abortive seronegative SARS-CoV-2. Nature. 2022;601(7891):110-117. doi:10.1038/s41586-021-04186-8
3. Moss P. The T cell immune response against SARS-CoV-2. Nat Immunol. 2022;23(2):186-193. doi:10.1038/s41590-021-01122-w
4. Almendro-Vázquez P, Laguna-Goya R, Paz-Artal E. Defending against SARS-CoV-2: The T cell perspective. Front Immunol. 2023;14:1107803. doi:10.3389/fimmu.2023.1107803
5. Grifoni A, Sidney J, Vita R, et al. SARS-CoV-2 human T cell epitopes: Adaptive immune response against COVID-19. Cell Host & Microbe. 2021;29(7):1076-1092. doi:10.1016/j.chom.2021.05.010
6. Hudson D, Fernandes RA, Basham M, Ogg G, Koohy H. Can we predict T cell specificity with digital biology and machine learning? Nat Rev Immunol. 2023;23(8):511-521. doi:10.1038/s41577-023-00835-3
7. Lee E, Sandgren K, Duette G, et al. Identification of SARS-CoV-2 Nucleocapsid and Spike T-Cell Epitopes for Assessing T-Cell Immunity. Subbarao K, ed. J Virol. 2021;95(6):e02002-20. doi:10.1128/JVI.02002-20
8. Bukhari SNH, Jain A, Haq E, Mehbodniya A, Webber J. Machine Learning Techniques for the Prediction of B-Cell and T-Cell Epitopes as Potential Vaccine Targets with a Specific Focus on SARS-CoV-2 Pathogen: A Review. Pathogens. 2022;11(2):146. doi:10.3390/pathogens11020146
9. Reynisson B, Alvarez B, Paul S, Peters B, Nielsen M. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Research. 2020;48(W1):W449-W454. doi:10.1093/nar/gkaa379
10. Saini SK, Hersby DS, Tamhane T, et al. SARS-CoV-2 genome-wide T cell epitope mapping reveals immunodominance and substantial CD8 + T cell activation in COVID-19 patients. Sci Immunol. 2021;6(58):eabf7550. doi:10.1126/sciimmunol.abf7550
11. Snyder TM, Gittelman RM, Klinger M, et al. Magnitude and Dynamics of the T-Cell Response to SARS-CoV-2 Infection at Both Individual and Population Levels. Infectious Diseases (except HIV/AIDS); 2020. doi:10.1101/2020.07.31.20165647
12. Stone JD, Chervin AS, Kranz DM. T-cell receptor binding affinities and kinetics: impact on T-cell activity and specificity. Immunology. 2009;126(2):165-176. doi:10.1111/j.1365-2567.2008.03015.x
13. Hennecke J, Wiley DC. T Cell Receptor–MHC Interactions up Close. Cell. 2001;104(1):1-4. doi:10.1016/S0092-8674(01)00185-4
14. FDA. Fact sheet for healthcare providers. Published September 2, 2021. https://www.fda.gov/media/146479/download
15. Tarke A, Sidney J, Kidd CK, et al. Comprehensive analysis of T cell immunodominance and immunoprevalence of SARS-CoV-2 epitopes in COVID-19 cases. Cell Reports Medicine. 2021;2(2):100204. doi:10.1016/j.xcrm.2021.100204
16. Blumenthal D, Edidin M, Gheber LA. Trafficking of MHC molecules to the cell surface creates dynamic protein patches. Journal of Cell Science. Published online January 1, 2016:jcs.187112. doi:10.1242/jcs.187112
17. Anikeeva N, Fischer NO, Blanchette CD, Sykulev Y. Extent of MHC Clustering Regulates Selectivity and Effectiveness of T Cell Responses. The Journal of Immunology. 2019;202(2):591-597. doi:10.4049/jimmunol.1801196
18. Assarsson E, Sidney J, Oseroff C, et al. A Quantitative Analysis of the Variables Affecting the Repertoire of T Cell Specificities Recognized after Vaccinia Virus Infection. The Journal of Immunology. 2007;178(12):7890-7901. doi:10.4049/jimmunol.178.12.7890
19. Jiang S, Wu S, Zhao G, et al. Identification of a promiscuous conserved CTL epitope within the SARS-CoV-2 spike protein. Emerging Microbes & Infections. 2022;11(1):730-740. doi:10.1080/22221751.2022.2043727
20. Aghbash PS, Eslami N, Shamekh A, Entezari-Maleki T, Baghi HB. SARS-CoV-2 infection: The role of PD-1/PD-L1 and CTLA-4 axis. Life Sciences. 2021;270:119124. doi:10.1016/j.lfs.2021.119124
21. Cevik M, Kuppalli K, Kindrachuk J, Peiris M. Virology, transmission, and pathogenesis of SARS-CoV-2. BMJ. Published online October 23, 2020:m3862. doi:10.1136/bmj.m3862
22. Nelde A, Bilich T, Heitmann JS, et al. SARS-CoV-2-derived peptides define heterologous and COVID-19-induced T cell recognition. Nat Immunol. 2021;22(1):74-85. doi:10.1038/s41590-020-00808-x
23. Ameratunga R, Woon ST, Jordan A, et al. Response to letter to the editor: the clinical utility of diagnostic T cell assays for COVID-19. Expert Review of Clinical Immunology. 2021;17(11):1159-1161. doi:10.1080/1744666X.2021.1982386
24. Jaskie K, Spanias A. Positive Unlabeled Learning. Springer International Publishing; 2022. doi:10.1007/978-3-031-79178-9
25. Friedman JH. Stochastic gradient boosting. Computational Statistics & Data Analysis. 2002;38(4):367-378. doi:10.1016/S0167-9473(01)00065-2
26. Vladimir Vovk. Kernel Ridge Regression - Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik. Springer. 2013:105-116.
27. Tolles J, Meurer WJ. Logistic Regression: Relating Patient Characteristics to Outcomes. JAMA. 2016;316(5):533. doi:10.1001/jama.2016.7653
28. Chen C, Breiman L. Using Random Forest to Learn Imbalanced Data. University of California, Berkeley. Published online January 2004.
29. Arik SÖ, Pfister T. TabNet: Attentive Interpretable Tabular Learning. AAAI. 2021;35(8):6679-6687. doi:10.1609/aaai.v35i8.16826