The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Summary Structural variants (SVs) are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight SV classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype-blocks in 26 human populations. Analyzing this set, we identify numerous gene-intersecting SVs exhibiting population stratification and describe naturally occurring homozygous gene knockouts suggesting the dispensability of a variety of human genes. We demonstrate that SVs are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of SV complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex SVs with multiple breakpoints likely formed through individual mutational events. Our catalog will enhance future studies into SV demography, functional impact and disease association.
Glioblastoma multiforme (GBM) is a lethal brain tumour in adults and children. However, DNA copy number and gene expression signatures indicate differences between adult and paediatric cases. To explore the genetic events underlying this distinction, we sequenced the exomes of 48 paediatric GBM samples. Somatic mutations in the H3.3-ATRX-DAXX chromatin remodelling pathway were identified in 44% of tumours (21/48). Recurrent mutations in H3F3A, which encodes the replication-independent histone 3 variant H3.3, were observed in 31% of tumours, and led to amino acid substitutions at two critical positions within the histone tail (K27M, G34R/G34V) involved in key regulatory post-translational modifications. Mutations in ATRX (α-thalassaemia/mental retardation syndrome X-linked) and DAXX (death-domain associated protein), encoding two subunits of a chromatin remodelling complex required for H3.3 incorporation at pericentric heterochromatin and telomeres, were identified in 31% of samples overall, and in 100% of tumours harbouring a G34R or G34V H3.3 mutation. Somatic TP53 mutations were identified in 54% of all cases, and in 86% of samples with H3F3A and/or ATRX mutations. Screening of a large cohort of gliomas of various grades and histologies (n = 784) showed H3F3A mutations to be specific to GBM and highly prevalent in children and young adults. Furthermore, the presence of H3F3A/ATRX-DAXX/TP53 mutations was strongly associated with alternative lengthening of telomeres and specific gene expression profiles. This is, to our knowledge, the first report to highlight recurrent mutations in a regulatory histone in humans, and our data suggest that defects of the chromatin architecture underlie paediatric and young adult GBM pathogenesis.
Motivation: The discovery of genomic structural variants (SVs) at high sensitivity and specificity is an essential requirement for characterizing naturally occurring variation and for understanding pathological somatic rearrangements in personal genome sequencing data. Of particular interest are integrated methods that accurately identify simple and complex rearrangements in heterogeneous sequencing datasets at single-nucleotide resolution, as an optimal basis for investigating the formation mechanisms and functional consequences of SVs.Results: We have developed an SV discovery method, called DELLY, that integrates short insert paired-ends, long-range mate-pairs and split-read alignments to accurately delineate genomic rearrangements at single-nucleotide resolution. DELLY is suitable for detecting copy-number variable deletion and tandem duplication events as well as balanced rearrangements such as inversions or reciprocal translocations. DELLY, thus, enables to ascertain the full spectrum of genomic rearrangements, including complex events. On simulated data, DELLY compares favorably to other SV prediction methods across a wide range of sequencing parameters. On real data, DELLY reliably uncovers SVs from the 1000 Genomes Project and cancer genomes, and validation experiments of randomly selected deletion loci show a high specificity.Availability: DELLY is available at www.korbel.embl.de/software.htmlContact: tobias.rausch@embl.de
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.