High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are only available for a few non-microbial species 1-4 . To address this issue, the international Genome 10K (G10K) consortium 5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling the most accurate and complete reference genomes to date. Here we summarize these developments, introduce a set of quality standards, and present lessons learned from sequencing and assembling 16 species representing major vertebrate lineages (mammals, birds, reptiles, amphibians, teleost fishes and cartilaginous fishes). We confirm that long-read sequencing technologies are essential for maximizing genome quality and that unresolved complex repeats and haplotype heterozygosity are major sources of error in assemblies. Our new assemblies identify and correct substantial errors in some of the best historical reference genomes. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an effort to generate high-quality, complete reference genomes for all ~70,000 extant vertebrate species and help enable a new era of discovery across the life sciences.
BackgroundThe ability to imitate the vocalizations of other organisms, a trait known as vocal learning, is shared by only a few organisms, including humans, where it subserves the acquisition of speech and language, and 3 groups of birds. In songbirds, vocal learning requires the coordinated activity of a set of specialized brain nuclei referred to as the song control system. Recent efforts have revealed some of the genes that are expressed in these vocal nuclei, however a thorough characterization of the transcriptional specializations of this system is still missing. We conducted a rigorous and comprehensive analysis of microarrays, and conducted a separate analysis of 380 genes by in situ hybridizations in order to identify molecular specializations of the major nuclei of the song system of zebra finches (Taeniopygia guttata), a songbird species.ResultsOur efforts identified more than 3300 genes that are differentially regulated in one or more vocal nuclei of adult male birds compared to the adjacent brain regions. Bioinformatics analyses provided insights into the possible involvement of these genes in molecular pathways such as cellular morphogenesis, intrinsic cellular excitability, neurotransmission and neuromodulation, axonal guidance and cela-to-cell interactions, and cell survival, which are known to strongly influence the functional properties of the song system. Moreover, an in-depth analysis of specific gene families with known involvement in regulating the development and physiological properties of neuronal circuits provides further insights into possible modulators of the song system.ConclusionOur study represents one of the most comprehensive molecular characterizations of a brain circuit that evolved to facilitate a learned behavior in a vertebrate. The data provide novel insights into possible molecular determinants of the functional properties of the song control circuitry. It also provides lists of compelling targets for pharmacological and genetic manipulations to elucidate the molecular regulation of song behavior and vocal learning.Electronic supplementary materialThe online version of this article (10.1186/s12864-018-4578-0) contains supplementary material, which is available to authorized users.
Background Vocal learning, the ability to learn to produce vocalizations through imitation, relies on specialized brain circuitry known in songbirds as the song system. While the connectivity and various physiological properties of this system have been characterized, the molecular genetic basis of neuronal excitability in song nuclei remains understudied. We have focused our efforts on examining voltage-gated ion channels to gain insight into electrophysiological and functional features of vocal nuclei. A previous investigation of potassium channel genes in zebra finches ( Taeniopygia guttata ) revealed evolutionary modifications unique to songbirds, as well as transcriptional specializations in the song system [Lovell PV, Carleton JB, Mello CV. BMC Genomics 14:470 2013]. Here, we expand this approach to sodium, calcium, and chloride channels along with their modulatory subunits using comparative genomics and gene expression analysis encompassing microarrays and in situ hybridization. Results We found 23 sodium, 38 calcium, and 33 chloride channel genes (HGNC-based classification) in the zebra finch genome, several of which were previously unannotated. We determined 15 genes are missing relative to mammals, including several genes (CLCAs, BEST2) linked to olfactory transduction. The majority of sodium and calcium but few chloride channels showed differential expression in the song system, among them SCN8A and CACNA1E in the direct motor pathway, and CACNG4 and RYR2 in the anterior forebrain pathway. In several cases, we noted a seemingly coordinated pattern across multiple nuclei (SCN1B, SCN3B, SCN4B, CACNB4) or sparse expression (SCN1A, CACNG5, CACNA1B). Conclusion The gene families examined are highly conserved between avian and mammalian lineages. Several cases of differential expression likely support high-frequency and burst firing in specific song nuclei, whereas cases of sparse patterns of expression may contribute to the unique electrophysiological signatures of distinct cell populations. These observations lay the groundwork for manipulations to determine how ion channels contribute to the neuronal excitability properties of vocal learning systems. Electronic supplementary material The online version of this article (10.1186/s12864-019-5871-2) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.