2019
DOI: 10.1038/s41467-018-08148-z
|View full text |Cite
|
Sign up to set email alerts
|

Multi-platform discovery of haplotype-resolved structural variation in human genomes

Abstract: The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also d… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

16
633
1
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 722 publications
(685 citation statements)
references
References 67 publications
16
633
1
1
Order By: Relevance
“…While the number of fully resolved SVs per genome in gnomAD-SV using the integration of multiple algorithms here (n = 7,439 SVs per genome from ~32X coverage WGS) is roughly twice that of existing references from short-read WGS, such as the 1000 Genomes Project (3,441 SVs from ~7X WGS) and the GTEx project (3,658 SVs from ~50X WGS), 1,42 it is lower than estimates from recent long-read WGS analyses (24,825 SVs from ~40X long-read WGS). 19 The technology and methods used here are thus blind to a disproportionate fraction of repeat-mediated SVs, and underestimate the true mutation rates within these hypermutable regions. Similarly, high copy state MCNVs often require specialized algorithms and manual curation to fully delineate their complicated haplotype structures, 12,65,66 suggesting that the 1,055 MCNVs reported here are an incomplete portrait of extreme copy-number polymorphisms.…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…While the number of fully resolved SVs per genome in gnomAD-SV using the integration of multiple algorithms here (n = 7,439 SVs per genome from ~32X coverage WGS) is roughly twice that of existing references from short-read WGS, such as the 1000 Genomes Project (3,441 SVs from ~7X WGS) and the GTEx project (3,658 SVs from ~50X WGS), 1,42 it is lower than estimates from recent long-read WGS analyses (24,825 SVs from ~40X long-read WGS). 19 The technology and methods used here are thus blind to a disproportionate fraction of repeat-mediated SVs, and underestimate the true mutation rates within these hypermutable regions. Similarly, high copy state MCNVs often require specialized algorithms and manual curation to fully delineate their complicated haplotype structures, 12,65,66 suggesting that the 1,055 MCNVs reported here are an incomplete portrait of extreme copy-number polymorphisms.…”
Section: Discussionmentioning
confidence: 99%
“…Finally, we leveraged matched long-read WGS data available for four individuals to perform in silico confirmation of our SVs predicted from short-read WGS. 19,27,28 These analyses yielded a confirmation rate of 94.0% for SVs with breakpoint-level read evidence (92.8% of all SVs), and revealed that 59.8% of breakpoint coordinates from the gnomAD-SV callset were accurate within a single nucleotide of the long-read data, while 75.9% were accurate within ±10bp. In conclusion, despite the limitations of short-read WGS, the seven benchmarking approaches we applied here suggest that these data conform to many fundamental principles of population genetics, including Mendelian segregation, Hardy-Weinberg equilibrium, population stratification, and linkage disequilibrium, and that gnomAD-SV is sufficiently sensitive and specific to provide a contemporary resource for most applications in human genomics.…”
Section: Introductionmentioning
confidence: 95%
See 3 more Smart Citations