With the increasing availability of various ‘omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.
Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNOG v2. The new release is the result of a number of improvements and expansions: (i) the underlying homology searches are now based on the SIMAP database; (ii) the orthologous groups have been extended to 41 levels of selected taxonomic ranges enabling much more fine-grained orthology assignments; and (iii) the newly designed web page is considerably faster with more functionality. In total, eggNOG v3 contains 721 801 orthologous groups, encompassing a total of 4 396 591 genes. Additionally, we updated 4873 and 4850 original COGs and KOGs, respectively, to include all 1133 organisms. At the universal level, covering all three domains of life, 101 208 orthologous groups are available, while the others are applicable at 40 more limited taxonomic ranges. Each group is amended by multiple sequence alignments and maximum-likelihood trees and broad functional descriptions are provided for 450 904 orthologous groups (62.5%).
Cell fate choice and commitment of multipotent progenitor cells to a differentiated lineage requires broad changes of their gene expression profile. But how progenitor cells overcome the stability of their gene expression configuration (attractor) to exit the attractor in one direction remains elusive. Here we show that commitment of blood progenitor cells to the erythroid or myeloid lineage is preceded by the destabilization of their high-dimensional attractor state, such that differentiating cells undergo a critical state transition. Single-cell resolution analysis of gene expression in populations of differentiating cells affords a new quantitative index for predicting critical transitions in a high-dimensional state space based on decrease of correlation between cells and concomitant increase of correlation between genes as cells approach a tipping point. The detection of “rebellious cells” that enter the fate opposite to the one intended corroborates the model of preceding destabilization of a progenitor attractor. Thus, early warning signals associated with critical transitions can be detected in statistical ensembles of high-dimensional systems, offering a formal theory-based approach for analyzing single-cell molecular profiles that goes beyond current computational pattern recognition, does not require knowledge of specific pathways, and could be used to predict impending major shifts in development and disease.
The spectacular escalation in complexity in early bilaterian evolution correlates with a strong increase in the number of microRNAs1,2. To explore the link between the birth of ancient microRNAs and body plan evolution, we set out to determine the ancient sites of activity of conserved bilaterian microRNA families in a comparative approach. We reason that any specific localization shared between protostomes and deuterostomes (the two major superphyla of bilaterian animals) should probably reflect an ancient specificity of that microRNA in their last common ancestor. Here, we investigate the expression of conserved bilaterian microRNAs in Platynereis dumerilii, a protostome retaining ancestral bilaterian features3,4, in Capitella, another marine annelid, in the sea urchin Strongylocentrotus, a deuterostome, and in sea anemone Nematostella, representing an outgroup to the bilaterians. Our comparative data indicate that the oldest known animal microRNA, miR-100, and the related miR-125 and let-7 were initially active in neurosecretory cells located around the mouth. Other sets of ancient microRNAs were first present in locomotor ciliated cells, specific brain centres, or, more broadly, one of four major organ systems: central nervous system, sensory tissue, musculature and gut. These findings reveal that microRNA evolution and the establishment of tissue identities were closely coupled in bilaterian evolution. Also, they outline a minimum set of cell types and tissues that existed in the protostome–deuterostome ancestor.
The increasing number of sequenced genomes has prompted the development of several automated orthology prediction methods. Tests to evaluate the accuracy of predictions and to explore biases caused by biological and technical factors are therefore required. We used 70 manually curated families to analyze the performance of five public methods in Metazoa. We analyzed the strengths and weaknesses of the methods and quantified the impact of biological and technical challenges. From the latter part of the analysis, genome annotation emerged as the largest single influencer, affecting up to 30% of the performance. Generally, most methods did well in assigning orthologous group but they failed to assign the exact number of genes for half of the groups. The publicly available benchmark set (http://eggnog.embl.de/orthobench/) should facilitate the improvement of current orthology assignment protocols, which is of utmost importance for many fields of biology and should be tackled by a broad scientific community.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.