After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist 1,2. Here we present a human genome assembly that surpasses the continuity of GRCh38 2 , along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome 3 , we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes. Complete, telomere-to-telomere reference genome assemblies are necessary to ensure that all genomic variants are discovered and studied. At present, unresolved areas of the human genome are defined by multi-megabase satellite arrays in the pericentromeric regions and the ribosomal DNA arrays on acrocentric short arms, as well as regions enriched in segmental duplications that are greater than hundreds of kilobases in length and that exhibit sequence identity of more than 98% between paralogues. Owing to their absence from the reference, these repeat-rich sequences are often excluded from genetics and genomics studies, which limits the scope of association and functional analyses 4,5. Unresolved repeat sequences also result in unintended consequences; for example, paralogous sequence variants incorrectly being called as allelic variants 6 , and the contamination of bacterial gene databases 7. Completion of the entire human genome is expected to contribute to our understanding of chromosome function 8 , human disease 9 and genomic variation, which will improve technologies in biomedicine that use short-read mapping to a reference genome (for example, RNA sequencing (RNA-seq) 10 , chromatin immunoprecipitation followed by sequencing (ChIP-seq) 11 and assay for transposase-accessible chromatin using sequencing (ATAC-seq) 12). The fundamental challenge of reconstructing a genome from many comparatively short sequencing reads-a process known as genome assembly-is distinguishing the repeated sequences from one another 13. Resolving such repeats relies on sequencing reads that are long enough to span the entire repeat or accurate enough to distinguish each repeat copy on the basis of...
De novo assembly of a human genome using nanopore long-read sequences has been reported but it used more than 150,000 CPU hours and weeks of wall-clock time. To enable rapid human genome assembly we present Shasta, a de novo long read assembler, and polishing algorithms named MarginPolish and HELEN. Using a single PromethION nanopore sequencer and our toolkit, we assembled eleven highly contiguous human genomes de novo in nine days. We achieved ~63x coverage, 42 Kb read N50, and 6.5x coverage in 100 Kb+ reads using three flow cells per sample. Shasta produced a complete haploid human genome assembly in under six hours on a single commercial compute node. MarginPolish and HELEN polished haploid assemblies to more than 99.9% identity (QV30) with nanopore reads alone. Addition of proximity ligation (Hi-C) sequencing enabled near chromosome-level scaffolds for all eleven genomes. We compare our assembly performance to existing methods for diploid, haploid, and trio-binned human samples and report superior accuracy and speed.
Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.
BackgroundInsulin producing beta cell and glucagon producing alpha cells are colocalized in pancreatic islets in an arrangement that facilitates the coordinated release of the two principal hormones that regulate glucose homeostasis and prevent both hypoglycemia and diabetes. However, this intricate organization has also complicated the determination of the cellular source(s) of the expression of genes that are detected in the islet. This reflects a significant gap in our understanding of mouse islet physiology, which reduces the effectiveness by which mice model human islet disease.ResultsTo overcome this challenge, we generated a bitransgenic reporter mouse that faithfully labels all beta and alpha cells in mouse islets to enable FACS-based purification and the generation of comprehensive transcriptomes of both populations. This facilitates systematic comparison across thousands of genes between the two major endocrine cell types of the islets of Langerhans whose principal hormones are of cardinal importance for glucose homeostasis. Our data leveraged against similar data for human beta cells reveal a core common beta cell transcriptome of 9900+ genes. Against the backdrop of overall similar beta cell transcriptomes, we describe marked differences in the repertoire of receptors and long non-coding RNAs between mouse and human beta cells.ConclusionsThe comprehensive mouse alpha and beta cell transcriptomes complemented by the comparison of the global (dis)similarities between mouse and human beta cells represent invaluable resources to boost the accuracy by which rodent models offer guidance in finding cures for human diabetes.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-15-620) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.