MURAL - Maynooth University Research Archive Library



    Why have so few proteomic biomarkers “survived” validation? (Sample size and independent validation considerations)


    Hernández, Belinda and Parnell, Andrew and Pennington, Stephen R. (2014) Why have so few proteomic biomarkers “survived” validation? (Sample size and independent validation considerations). Proteomics, 14 (13-14). pp. 1587-1592. ISSN 1615-9853

    [img]
    Preview
    Download (236kB) | Preview


    Share your research

    Twitter Facebook LinkedIn GooglePlus Email more...



    Add this article to your Mendeley library


    Abstract

    Proteomic biomarker discovery has led to the identification of numerous potential candidates for disease diagnosis, prognosis, and prediction of response to therapy. However, very few of these identified candidate biomarkers reach clinical validation and go on to be routinely used in clinical practice. One particular issue with biomarker discovery is the identification of significantly changing proteins in the initial discovery experiment that do not validate when subsequently tested on separate patient sample cohorts. Here, we seek to highlight some of the statistical challenges surrounding the analysis of LC-MS proteomic data for biomarker candidate discovery. We show that common statistical algorithms run on data with low sample sizes can overfit and yield misleading misclassification rates and AUC values. A common solution to this problem is to prefilter variables (via, e.g. ANOVA and or use of correction methods such as Bonferonni or false discovery rate) to give a smaller dataset and reduce the size of the apparent statistical challenge. However, we show that this exacerbates the problem yielding even higher performance metrics while reducing the predictive accuracy of the biomarker panel. To illustrate some of these limitations, we have run simulation analyses with known biomarkers. For our chosen algorithm (random forests), we show that the above problems are substantially reduced if a sufficient number of samples are analyzed and the data are not prefiltered. Our view is that LC-MS proteomic biomarker discovery data should be analyzed without prefiltering and that increasing the sample size in biomarker discovery experiments should be a very high priority.

    Item Type: Article
    Keywords: Bioinformatics; Biomarker panels; Cross-validation; Proteomic discovery; Random forest; Sample size;
    Academic Unit: Faculty of Science and Engineering > Mathematics and Statistics
    Faculty of Science and Engineering > Research Institutes > Hamilton Institute
    Faculty of Social Sciences > Research Institutes > Irish Climate Analysis and Research Units, ICARUS
    Item ID: 19115
    Identification Number: https://doi.org/10.1002/pmic.201300377
    Depositing User: Andrew Parnell
    Date Deposited: 29 Oct 2024 13:19
    Journal or Publication Title: Proteomics
    Publisher: Wiley
    Refereed: Yes
    URI:
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Repository Staff Only(login required)

    View Item Item control page

    Downloads

    Downloads per month over past year

    Origin of downloads