Summary The exponential growth in available genomic data is expected to reach full sequencing of a million genomes in the coming decade. Improving and developing methods to analyze these genomes and to reveal their utility is of major interest in a wide variety of fields, such as comparative and functional genomics, evolution and bioinformatics. Phylogenetic profiling is an established method for predicting functional interactions between proteins based on similarities in their evolutionary patterns across species. Proteins that function together (i.e. generate complexes, interact in the same pathways or improve adaptation to environmental niches) tend to show coordinated evolution across the tree of life. The normalized phylogenetic profiling (NPP) method takes into account minute changes in proteins across species to identify protein co-evolution. Despite the success of this method, it is still not clear what set of parameters is required for optimal use of co-evolution in predicting functional interactions. Moreover, it is not clear if pathway evolution or function should direct parameter choice. Here, we create a reliable and usable NPP construction pipeline. We explore the effect of parameter selection on functional interaction prediction using NPP from 1028 genomes, both separately and in various value combinations. We identify several parameter sets that optimize performance for pathways with certain biological annotation. This work reveals the importance of choosing the right parameters for optimized function prediction based on a biological context. Availability and implementation Source code and documentation are available on GitHub: https://github.com/iditam/CompareNPPs. Supplementary information Supplementary data are available at Bioinformatics online.
Cross-species protein conservation patterns, as directed by natural selection, are indicative of the interplay between protein function, protein-protein interaction and evolution. Since the beginning of the genomic era, proteins were characterized as either conserved or not conserved. This simple classification became archaic and cursory once data on protein orthologs became available for thousands of species.To enrich the language used to describe protein conservation patterns, and to understand their biological significance, we classified 20,294 human proteins against 1096 species. Analyses of the conservation patterns of human proteins in different eukaryotic clades yielded extremely variable and rich patterns that had never been characterized or studied before. Using mathematical classifications, we defined seven conservation motifs: Steps, Critical, Lately Developed, Plateau, Clade Loss, Trait Loss and Gain, which describe the evolution of human proteins.One type of motif, which we termed Gain, describes the human proteins that are highly conserved in a small number of organisms but are not found in most other species.Interestingly, this pattern predicts 73 possible instances of horizontal gene transfer in eukaryotes.Overall, our work offers novel terms for conservation patterns and defines a new language intended to classify proteins based on evolution, reveal aspects of protein evolution, and improve the understanding of protein functions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.