2020
DOI: 10.1038/s41586-020-2287-8
|View full text |Cite|
|
Sign up to set email alerts
|

A structural variation reference for medical and population genetics

Abstract: Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to tho… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

47
793
2
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
4

Relationship

1
9

Authors

Journals

citations
Cited by 752 publications
(843 citation statements)
references
References 52 publications
47
793
2
1
Order By: Relevance
“…The initial discovery effort from the 1000 Genomes Project 12,13 revealed that a diverse landscape of SVs could be captured from srWGS with just 4-7X coverage (3,422 SVs per genome), and more recent population genetic and human disease studies using deeper (30X or higher) srWGS and diverse methods have varied in estimates of SVs that can be captured using srWGS from 401 -10,884 per genome, with the highest end of this range generated from the Human Genome Structural Variation Consortium (HGSVC; Figure 1A) . 1,[13][14][15][16][17][18] Emerging long-read WGS (lrWGS) technologies, which involve sequencing thousands to millions of contiguous nucleotides from a single strand of DNA, are better suited for SV discovery than srWGS. The most widely tested lrWGS technologies include single-molecule real-time (SMRT) sequencing from Pacific Biosciences (PacBio) and sequencing by ionic current through a nanopore channel (Oxford Nanopore Technologies [ONT]).…”
Section: Main Textmentioning
confidence: 99%
“…The initial discovery effort from the 1000 Genomes Project 12,13 revealed that a diverse landscape of SVs could be captured from srWGS with just 4-7X coverage (3,422 SVs per genome), and more recent population genetic and human disease studies using deeper (30X or higher) srWGS and diverse methods have varied in estimates of SVs that can be captured using srWGS from 401 -10,884 per genome, with the highest end of this range generated from the Human Genome Structural Variation Consortium (HGSVC; Figure 1A) . 1,[13][14][15][16][17][18] Emerging long-read WGS (lrWGS) technologies, which involve sequencing thousands to millions of contiguous nucleotides from a single strand of DNA, are better suited for SV discovery than srWGS. The most widely tested lrWGS technologies include single-molecule real-time (SMRT) sequencing from Pacific Biosciences (PacBio) and sequencing by ionic current through a nanopore channel (Oxford Nanopore Technologies [ONT]).…”
Section: Main Textmentioning
confidence: 99%
“…The number and composition of SVs should be compared with previous studies of similar cohorts to identify problematic samples and SVs. For germline SVs, a recent large population-based study reported an average of 4400 germline SVs per individual (Abel et al, 2020), and the Genome Aggregation Database reported 7400 germline SVs per individual on average (Collins et al, 2020). Both studies were based on Illumina short reads.…”
Section: Quality Controlmentioning
confidence: 99%
“…As possible LINE1 source elements, we first extracted 5,228 full-length recent primate-specific LINE1 elements from the human reference genome (reference putative LINE1 source elements). In addition, since it is known that there are several active non-reference LINE1 source elements, which are not included in the reference genome but can be detected as polymorphic insertions, we also included 652 and 2610 full-length LINE1 insertions identified in 1000 genomes Phase 3 42 and gnomAD v2.1 43 , respectively. Furthermore, when many inserted sequences were aligned to the same genomic locations, we searched for the germline LINE1 insertion near those positions from the normal sequence data and manually curated the putative rare germline LINE1 insertions that were considered as the source of LINE1 transduction.…”
Section: Characterization Of Mobile Element Insertionsmentioning
confidence: 99%