Codoñer, Francisco M., O'Dea, Shirley and Fares, Mario A. (2008) Reducing the false positive rate in the non-parametric analysis of molecular coevolution. BMC Evolutionary Biology, 8 (106). ISSN 1471-2148
Preview
SO-Reducing-false-discovery.pdf
Download (685kB) | Preview
Official URL: http://www.biomedcentral.com/1471-2148/8/106
Abstract
Background: The strength of selective constraints operating on amino acid sites of proteins has a multifactorial nature. In fact,
amino acid sites within proteins coevolve due to their functional and/or structural relationships. Different methods have been
developed that attempt to account for the evolutionary dependencies between amino acid sites. Researchers have invested a
significant effort to increase the sensitivity of such methods. However, the difficulty in disentangling functional co-dependencies
from historical covariation has fuelled the scepticism over their power to detect biologically meaningful results. In addition, the
biological parameters connecting linear sequence evolution to structure evolution remain elusive. For these reasons, most of
the evolutionary studies aimed at identifying functional dependencies among protein domains have focused on the structural
properties of proteins rather than on the information extracted from linear multiple sequence alignments (MSA). Nonparametric
methods to detect coevolution have been reported to be especially susceptible to produce false positive results
based on the properties of MSAs. However, no formal statistical analysis has been performed to definitively test the differential
effects of these properties on the sensitivity of such methods.
Results: Here we test the effect that variations on the MSA properties have over the sensitivity of non-parametric methods to
detect coevolution. We test the effect that the size of the MSA (number of sequences), mean pairwise amino acid distance per
site and the strength of the coevolution signal have on the ability of non-parametric methods to detect coevolution. Our results
indicate that all three factors have significant effects on the accuracy of non-parametric methods. Further, introducing statistical
filters improves the sensitivity and increases the statistical power of the methods to detect functional coevolution. Statistical
analysis of the physico-chemical properties of amino acid sites in the context of the protein structure reveals striking
dependencies among amino acid sites. Results indicate a covariation trend in the hydrophobicities and molecular weight
characteristics of amino acid sites when analysing a non-redundant set of 8000 protein structures. Using this biological
information as filter in coevolutionary analyses minimises the false positive rate of these methods. Application of these filters to
three different proteins with known functional domains supports the importance of using biological filters to detect coevolution.
Conclusion: Coevolutionary analyses using non-parametric methods have proved difficult and highly prone to provide spurious
results depending on the properties of MSAs and on the strength of coevolution between amino acid sites. The application of
statistical filters to the number of pairs detected as coevolving reduces significantly the number of artifactual results. Analysis of
the physico-chemical properties of amino acid sites in the protein structure context reveals their structure-dependent
covariation. The application of this known biological information to the analysis of covariation greatly enhances the functional
coevolutionary signal and removes historical covariation. Simultaneous use of statistical and biological data is instrumental in the
detection of functional amino acid sites dependencies and compensatory changes at the protein level.
Item Type: | Article |
---|---|
Additional Information: | © 2008 Codoñer et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
Keywords: | false positive rate; non-parametric analysis; molecular coevolution; false discovery; coevolution; amino acid sites; proteins; |
Academic Unit: | Faculty of Science and Engineering > Biology Faculty of Science and Engineering > Research Institutes > Institute of Immunology |
Item ID: | 6934 |
Identification Number: | 10.1186/1471-2148-8-106 |
Depositing User: | Dr. Shirley O'Dea |
Date Deposited: | 01 Feb 2016 09:22 |
Journal or Publication Title: | BMC Evolutionary Biology |
Publisher: | BioMed Central Ltd |
Refereed: | Yes |
Funders: | Science Foundation Ireland (SFI), Marie Curie Actions |
URI: | https://mural.maynoothuniversity.ie/id/eprint/6934 |
Use Licence: | This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here |
Repository Staff Only (login required)
Downloads
Downloads per month over past year