Biomarkers in clinical practice: opportunities and challenges
Main Article Content
Abstract
Many proposed biomarkers fail to produce clinically actionable results. Simply put, the research problem addressed here is: why do most biomarker projects fail? In this contribution we describe four commonly encountered problems and outline procedures that address these challenges. The specific issues addressed are as follows:
- A statistically significant result in a between-group hypothesis test often does not result in classification success.
- Cross-validation is commonly used in model validation. The successive steps in cross-validation expose it to multiple sources of failure that may result in erroneous conclusions of success.
- Failure to rigorously establish the test-retest reliability of a biomarker panel precludes its use in longitudinal monitoring of treatment response or disease progression. Further, it should be recognized that the minimum detectable difference is not the minimal clinically important difference.
- Sample size estimates used in the design of a clinical study must be determined by the objectives of the study. The sample sizes required in reliability studies and in the evaluation of biomarkers as prodromes must be determined with those objectives in mind and are far larger than sample size requirements computed for the purpose of hypothesis testing.
We conclude with suggestions for transparency and collaboration that would facilitate the use of biomarkers in clinical practice.
Article Details
The Medical Research Archives grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the Medical Research Archives.
References
2. Food and Drug Administration. Biomarker qualification: evidentiary framework. Guidance for Industry and FDA staff. Draft Guidance FDA 2018.
3. Prata D Mochelli and Kapur S Clinically meaningful biomarkers for psychosis: a systematic and quantitative review. Neurosci and Biobehavioral Rev. 2014; 45, 134-141.
4. Fernández-Delgado M, Cernadas E. and Barro S. Do we need hundreds of classifiers to solve real world classification problems? J. of Machine Learning Res. 2014,15, 3133-3181.
5. Cohen JF, Korevaar DA, Altman DG, Bruns DE, Gatson CA, Hooft L, et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open. 2016; 6: e012799.
6. Head ML, Holman L, Lanfear R, Kahn AT and Jennions MD The extent and consequences of p-hacking in science. PLoS Biology. 2015; 13(3): e1002106.
7. American Statistical Association. The ASA’s statement on p-values: context, process and purpose. Amer. Statistician. 2016; 70(2), 129-131.
8. Wasserstein RL, Schirm AL, and Lazar NA Moving to a world beyond “p<0.05.” Am Statistician. 2019; 73, Supplement 1, pp 1-19. Article 1.
9. Ionides, ElL, Giessing, A., Ritov, Y and Page, SE. Response to the ASA’s statement on p-values context, process and purpose. The Amer. Statistician. 2017, 71(1), 88-89.
10. Rapp PE, Cellucci CJ, Keyser DO, Gilpin AMK and Darmon DM Statistical issues in TBI clinical studies. Front in Neurology. 2013; 4, 177.
11. Watanabe TAA, Cellucci CJ, Kohegyi E, Bashore TR, Josiassen RC, Greenbaun NN and Rapp PE The algorithmic complexity of multichannel EEGs is sensitive to changes in behavior. Psychophysiology, 2003 40, 77-97.
12. Tibshirani R Regression shrinkage and selection via the Lasso. J Roy Stat Soc. Series B. 1996; 58(1), 267-288.
13. Zou H and Hastie T Regularization and variable selection via the elastic net. J Roy Stat Soc Series B. 2005; 67(2), 301-320.
14. Fox EW, Hill RA, Leibowitz SG, Olsen AR, Thornburg D.J. and Weber MH (2017). Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology. Environ Modeling Assess. 2017; 189(7), 316.
15. Speiser JL, Miller ME, Tooze J. and Ip E A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems Applications. 2017; 1234, 93-101.
16. Hastie T, Tibshirani R and Friedman J Elements of Statistical Learning. Second Edition. New York: Springer. 2009
17. Hajcak G. Klawohn J and Meyer A The utility of event-related potentials in clinical psychology. Ann Rev Clin Psych. 2019; 15, 71-95.
18. Cole WR, Arrieux JP, Schwab K, Ivins BJ, Qashu FM and Lewis S. Test-retest reliability of four computerized neurocognitive assessment tools in an active duty military population. Arch of Clin Neuropsych. 2013; 28, 732-742.
19. Shrout PE and Fleiss JL Intraclass correlations: Uses in assessing rater reliability. Psych Bull. 1979; 86(2), 420-428.
20. McGraw KO and Wong SP Forming inferences about some intraclass correlation coefficients. Psych Methods. 1996; 1(1), 30-46.
21. Müller R and Büttner PA critical discussion of intraclass correlation coefficients. Stat in Medicine. 1994; 13(23-24), 2465-2476.
22. Koo TK and Li MY A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropractic Res. 2016; 15(2), 155-163.
23. Copay AG, Subach BR, Glassman S., Polly D. and Shuler T. Understanding the minimum clinically important difference: a review of concepts and methods. Spine J. 2007; 7, 541-546.
24. Bleiberg J Garmoe WS., Halpern EL Reeves DL and Nadler JD. Consistency of within-day and across-day performance after mild brain injury. Neuropsychiatry, Neuropsychyology and Behavioral Neurology. 1997,10(4), 247-253.
25. Button KS, Ioannidis JPA, Mokrysz G, Nosek BA, Flint J, Robinson ESJ. and Munafò MR Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neurosci. 2013; 14(5), 365-376.
26. Wang C, Costanzo ME, Rapp PE, Darmon D, Bashirelahi K, Nathan DE, Cellucci CJ, Roy MJ and Keyser DO Identifying electrophysiological prodromes of post-traumatic stress disorder: results form a pilot study. Front Psychiat. 2017; Volume 8, Article 71.
27. Zou GY Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Stat in Medicine. 2012; 31, 3972-3981.
28. Hoeffding W Probability inequalities for sums of bounded random variables. J Amer Stat Assoc. 1963; 58(301), 13-30.
29. Clopper CJ and Pearson ES The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika. 1934; 26(4), 404-413.
30. Botvinik-Nezer B, Holzmeister F, Camerer CF, Dreber A, Huber J, Johannesson M, et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature. 2020; 582, 84-88.
31. Garrett-Ruffin, S, Cowden Hindash, A, Kaczkurkin, AN, Mears, RP, et al.. Open science in psychophysiology: an overview of challenges and emerging solutions. Internat J Psychophysiol 162, 69-78.
32. Hardwicke TE and Wagenmakers EJ Reducing bias, increasing transparency and calibrating confidence with preregistration. Nature Human Behav. 2023; 7(1), 15-26.
33. Kappenman ES, Farrens JL, Zhang W, Stewart AX and Luck SJ ERP CORE: An open resource for human event-related potential research. Neuroimage. 2021; 225: 117465.
34. Human Connectome Project. Reference Manual: WU-Minn HCP 500 Subjects +MEG2 Release: WU-Minn Consortium Human Connectome Project. 2014
35. Niso G, Rogers C, Moreauy JT, Chen, L-Y. Madjar C, Das S, et al. OMEGA: the open MEG archive. Neuroimage. 2016; 124(pt, B), 1182-1187
36. Van Dijk H, van Wingen G, Denys D, Olbrich S, van Ruth R and Arns M The two decades brain clinics research archive for insights in neurophysiology (TDBRAIN) database. Scientific Data. 2022; 9: 33.
37. Nosek BA, Ebersole CR, DeHaven AC and Mellor DT The preregistration revolution. Proc Nat Acad Sciences. 2018; 115(11), 2600-2606.