Comparison of related genomes has emerged as a powerful lens for genome interpretation. Here, we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and report constrained elements covering ~4.2% of the genome. We use evolutionary signatures and comparison with experimental datasets to suggest candidate functions for ~60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events, and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements, and ~1,000 primate- and human-accelerated elements. Overlap with disease-associated variants suggests our findings will be relevant for studies of human biology and health.
Stem cells are defined as self-renewing cell populations that can differentiate into multiple distinct cell types. However, hundreds of different human cell lines from embryonic, fetal, and adult sources have been called stem cells, even though they range from pluripotent cells, typified by embryonic stem cells, which are capable of virtually unlimited proliferation and differentiation, to adult stem cell lines, which can generate a far more limited repertory of differentiated cell types. The rapid increase in reports of new sources of stem cells and their anticipated value to regenerative medicine 1, 2 have highlighted the need for a general, reproducible method for classification of these To sort the cell types we used an unsupervised machine learning approach to cluster transcriptional profiles of the cell preparations into stable distinct groups. Sparse nonnegative matrix factorization (sNMF) was adjusted for this task by implementing a bootstrapping algorithm to find the most stable groupings (see also Supplementary Discussion 1). 4, 5 The stability of the clustering 9 indicated that the dataset most likely contained about twelve different types of samples ( . The HANSE cell group consisted of transcriptional profiles that were derived from neurosurgical specimens following published protocols for multipotent neural progenitor derivation and propagation. 10, 11 These cells expressed markers that are commonly used to identify neural stem cells 12 (see Supplementary Figure 4), but the clustering clearly separated them from the other samples that had been derived from postmortem brains of prematurely born infants (see Figure 2). 10,11 We used a combination of analysis tools to explore the basis of the unsupervised classification of the samples in the core dataset. Gene Set Analysis 3 (GSA) is a means to identify the underlying themes in transcriptional data in terms of their biological relevance.GSA uses lists of genes 5 that are related in some way; the common criterion is that the relationships among the genes in the lists are supported by empirical evidence. 20 GSA highlighted numerous significant differences among the computationally defined categories.(See Supplementary Figure 2, Supplementary Table 11 and Supplementary Online Materials).While GSA is valuable for discovering specific differences among sample groups, it is limited to curated gene lists and cannot be used to discover new regulatory networks. The MATISSE algorithm 6 (http://acgt.cs.tau.ac.il/matisse) takes predefined protein-protein interactions (e.g. from yeast-two-hybrid screens) and seeks connected subnetworks that manifest high similarity in sample subsets. The modified version used in this analysis is capable of extracting subnetworks that are co-expressed in many samples but also significantly up-or down-regulated in a specific sample cluster. Since the PSC preparations were consistently clustered together we used MATISSE to look for distinctive molecular networks that might be associated with the unique PSC qualities of pluri...
Gibbons are small arboreal apes that display an accelerated rate of evolutionary chromosomal rearrangement and occupy a key node in the primate phylogeny between Old World monkeys and great apes. Here we present the assembly and analysis of a northern white-cheeked gibbon (Nomascus leucogenys) genome. We describe the propensity for a gibbon-specific retrotransposon (LAVA) to insert into chromosome segregation genes and alter transcription by providing a premature termination site, suggesting a possible molecular mechanism for the genome plasticity of the gibbon lineage. We further show that the gibbon genera (Nomascus, Hylobates, Hoolock and Symphalangus) experienced a near-instantaneous radiation ~5 million years ago, coincident with major geographical changes in Southeast Asia that caused cycles of habitat compression and expansion. Finally, we identify signatures of positive selection in genes important for forelimb development (TBX5) and connective tissues (COL1A1) that may have been involved in the adaptation of gibbons to their arboreal habitat.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.