The homologous recombination repair (HRR) pathway repairs DNA double-strand breaks in an error-free manner. Mutations in HRR genes can result in increased mutation rate and genomic rearrangements, and are associated with numerous genetic disorders and cancer. Despite intensive research, the HRR pathway is not yet fully mapped. Phylogenetic profiling analysis, which detects functional linkage between genes using coevolution, is a powerful approach to identify factors in many pathways. Nevertheless, phylogenetic profiling has limited predictive power when analyzing pathways with complex evolutionary dynamics such as the HRR. To map novel HRR genes systematically, we developed clade phylogenetic profiling (CladePP). CladePP detects local coevolution across hundreds of genomes and points to the evolutionary scale (e.g., mammals, vertebrates, animals, plants) at which coevolution occurred. We found that multiscale coevolution analysis is significantly more biologically relevant and sensitive to detect gene function. By using CladePP, we identified dozens of unrecognized genes that coevolved with the HRR pathway, either globally across all eukaryotes or locally in different clades. We validated eight genes in functional biological assays to have a role in DNA repair at both the cellular and organismal levels. These genes are expected to play a role in the HRR pathway and might lead to a better understanding of missing heredity in HRR-associated cancers (e.g., heredity breast and ovarian cancer). Our platform presents an innovative approach to predict gene function, identify novel factors related to different diseases and pathways, and characterize gene evolution.
Mapping co-evolved genes via phylogenetic profiling (PP) is a powerful approach to uncover functional interactions between genes and to associate them with pathways. Despite many successful endeavors, the understanding of co-evolutionary signals in eukaryotes remains partial. Our hypothesis is that ‘Clades’, branches of the tree of life (e.g. primates and mammals), encompass signals that cannot be detected by PP using all eukaryotes. As such, integrating information from different clades should reveal local co-evolution signals and improve function prediction. Accordingly, we analyzed 1028 genomes in 66 clades and demonstrated that the co-evolutionary signal was scattered across clades. We showed that functionally related genes are frequently co-evolved in only parts of the eukaryotic tree and that clades are complementary in detecting functional interactions within pathways. We examined the non-homologous end joining pathway and the UFM1 ubiquitin-like protein pathway and showed that both demonstrated distinguished co-evolution patterns in specific clades. Our research offers a different way to look at co-evolution across eukaryotes and points to the importance of modular co-evolution analysis. We developed the ‘CladeOScope’ PP method to integrate information from 16 clades across over 1000 eukaryotic genomes and is accessible via an easy to use web server at http://cladeoscope.cs.huji.ac.il.
In silico drug discovery is a complex process requiring flexibility and ingenuity in method selection and a careful validation of work protocols. GPCR in silico drug discovery poses additional challenges due to the paucity of crystallographic data. This paper starts by reviewing selected GPCR in silico screening programs reported in the literature, including both structure-based and ligand-based approaches. Particular emphasis is given to library design, binding mode selection, process validation and compound selection for biological testing. Following literature review, we provide insights into in silico methodologies and process workflows used at EPIX to drive over 20 highly successful screening and lead optimization programs performed since 2001. Applications of the various methodologies discussed are demonstrated by examples from recent programs that have not yet been published.
Over the next decade, more than a million eukaryotic species are expected to be fully sequenced. This has the potential to improve our understanding of genotype and phenotype crosstalk, gene function and interactions, and answer evolutionary questions. Here, we develop a machine-learning approach for utilizing phylogenetic profiles across 1154 eukaryotic species. This method integrates co-evolution across eukaryotic clades to predict functional interactions between human genes and the context for these interactions. We benchmark our approach showing a 14% performance increase (auROC) compared to previous methods. Using this approach, we predict functional annotations for less studied genes. We focus on DNA repair and verify that 9 of the top 50 predicted genes have been identified elsewhere, with others previously prioritized by high-throughput screens. Overall, our approach enables better annotation of function and functional interactions and facilitates the understanding of evolutionary processes underlying co-evolution. The manuscript is accompanied by a webserver available at: https://mlpp.cs.huji.ac.il.
Summary The exponential growth in available genomic data is expected to reach full sequencing of a million genomes in the coming decade. Improving and developing methods to analyze these genomes and to reveal their utility is of major interest in a wide variety of fields, such as comparative and functional genomics, evolution and bioinformatics. Phylogenetic profiling is an established method for predicting functional interactions between proteins based on similarities in their evolutionary patterns across species. Proteins that function together (i.e. generate complexes, interact in the same pathways or improve adaptation to environmental niches) tend to show coordinated evolution across the tree of life. The normalized phylogenetic profiling (NPP) method takes into account minute changes in proteins across species to identify protein co-evolution. Despite the success of this method, it is still not clear what set of parameters is required for optimal use of co-evolution in predicting functional interactions. Moreover, it is not clear if pathway evolution or function should direct parameter choice. Here, we create a reliable and usable NPP construction pipeline. We explore the effect of parameter selection on functional interaction prediction using NPP from 1028 genomes, both separately and in various value combinations. We identify several parameter sets that optimize performance for pathways with certain biological annotation. This work reveals the importance of choosing the right parameters for optimized function prediction based on a biological context. Availability and implementation Source code and documentation are available on GitHub: https://github.com/iditam/CompareNPPs. Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.