Biologists have long used model organisms to study human diseases, particularly when the model bears a close resemblance to the disease. We present a method that quantitatively and systematically identifies nonobvious equivalences between mutant phenotypes in different species, based on overlapping sets of orthologous genes from human, mouse, yeast, worm, and plant (212,542 gene-phenotype associations). These orthologous phenotypes, or phenologs, predict unique genes associated with diseases. Our method suggests a yeast model for angiogenesis defects, a worm model for breast cancer, mouse models of autism, and a plant model for the neural crest defects associated with Waardenburg syndrome, among others. Using these models, we show that SOX13 regulates angiogenesis, and that SEC23IP is a likely Waardenburg gene. Phenologs reveal functionally coherent, evolutionarily conserved gene networks-many predating the plant-animal divergence-capable of identifying candidate disease genes.angiogenesis | bioinformatics | evolution | gene-phenotype associations | homology B iochemical and molecular functions of a given protein are generally conserved between organisms; this observation is fundamental to biological research. For example, in x-ray crystallography studies, one can often choose the organism from which the protein is most easily crystallized to facilitate the study of the protein's biochemical function. On the other hand, even with a conserved gene, disruption of function may give rise to radically different phenotypic outcomes in different species. For example, mutating the human RB1 gene leads to retinoblastoma, a cancer of the retina, yet disrupting the nematode ortholog contributes to ectopic vulvae (1, 2). Thus, although a gene's "molecular" functions are conserved, the "organism-level" functions need not be. When a conserved gene is mutated, the resulting organism-level phenotype is an emergent property of the system. This bedrock principle underlying the use of model organisms not only allows us to study important aspects of human biology using mice or frogs, but also permits exploration of inherently multicellular processes, such as cancer, using unicellular organisms like yeast.Within this paradigm, once a molecular function has been discovered in one organism, it should be predictable in other organisms: GSK3 homologs in yeast are kinases, and such GSK3 homologs in every other organism will generally be kinases. In contrast, the emergent organism-level phenotypes are far less predictable between organisms, in part because relationships between genes and phenotypes are many-to-many. Manipulation of GSK3 perturbs nutrient and stress signaling in yeast, anteroposterior patterning and segmentation in insects, dorsoventral patterning in frogs, and craniofacial morphogenesis in mice (3-5). Recognizing functionally equivalent organism-level phenotypes between model organisms can therefore be nonobvious, especially across large evolutionary distances.However, the ability to recognize equivalent phenotypes betwee...
Correctly identifying associations of genes with diseases has long been a goal in biology. With the emergence of large-scale gene-phenotype association datasets in biology, we can leverage statistical and machine learning methods to help us achieve this goal. In this paper, we present two methods for predicting gene-disease associations based on functional gene associations and gene-phenotype associations in model organisms. The first method, the Katz measure, is motivated from its success in social network link prediction, and is very closely related to some of the recent methods proposed for gene-disease association inference. The second method, called Catapult (Combining dATa Across species using Positive-Unlabeled Learning Techniques), is a supervised machine learning method that uses a biased support vector machine where the features are derived from walks in a heterogeneous gene-trait network. We study the performance of the proposed methods and related state-of-the-art methods using two different evaluation strategies, on two distinct data sets, namely OMIM phenotypes and drug-target interactions. Finally, by measuring the performance of the methods using two different evaluation strategies, we show that even though both methods perform very well, the Katz measure is better at identifying associations between traits and poorly studied genes, whereas Catapult is better suited to correctly identifying gene-trait associations overall.The authors want to thank Jon Laurent and Kris McGary for some of the data used, and Li and Patra for making their code available. Most of Ambuj Tewari's contribution to this work happened while he was a postdoctoral fellow at the University of Texas at Austin.
Highly active antiretroviral therapy (HAART) can reduce human immunodeficiency virus type 1 (HIV-1) viremia to clinically undetectable levels. Despite this dramatic reduction, some virus is present in the blood. In addition, a long-lived latent reservoir for HIV-1 exists in resting memory CD4 ؉ T cells. This reservoir is believed to be a source of the residual viremia and is the focus of eradication efforts. Here, we use two measures of population structure-analysis of molecular variance and the Slatkin-Maddison test-to demonstrate that the residual viremia is genetically distinct from proviruses in resting CD4؉ T cells but that proviruses in resting and activated CD4 ؉ T cells belong to a single population. Residual viremia is genetically distinct from proviruses in activated CD4؉ T cells, monocytes, and unfractionated peripheral blood mononuclear cells. The finding that some of the residual viremia in patients on HAART stems from an unidentified cellular source other than CD4 ؉ T cells has implications for eradication efforts.
BackgroundPhenotypes and diseases may be related to seemingly dissimilar phenotypes in other species by means of the orthology of underlying genes. Such “orthologous phenotypes,” or “phenologs,” are examples of deep homology, and may be used to predict additional candidate disease genes.ResultsIn this work, we develop an unsupervised algorithm for ranking phenolog-based candidate disease genes through the integration of predictions from the k nearest neighbor phenologs, comparing classifiers and weighting functions by cross-validation. We also improve upon the original method by extending the theory to paralogous phenotypes. Our algorithm makes use of additional phenotype data — from chicken, zebrafish, and E. coli, as well as new datasets for C. elegans — establishing that several types of annotations may be treated as phenotypes. We demonstrate the use of our algorithm to predict novel candidate genes for human atrial fibrillation (such as HRH2, ATP4A, ATP4B, and HOPX) and epilepsy (e.g., PAX6 and NKX2-1). We suggest gene candidates for pharmacologically-induced seizures in mouse, solely based on orthologous phenotypes from E. coli. We also explore the prediction of plant gene–phenotype associations, as for the Arabidopsis response to vernalization phenotype.ConclusionsWe are able to rank gene predictions for a significant portion of the diseases in the Online Mendelian Inheritance in Man database. Additionally, our method suggests candidate genes for mammalian seizures based only on bacterial phenotypes and gene orthology. We demonstrate that phenotype information may come from diverse sources, including drug sensitivities, gene ontology biological processes, and in situ hybridization annotations. Finally, we offer testable candidates for a variety of human diseases, plant traits, and other classes of phenotypes across a wide array of species.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.