2015
DOI: 10.1038/nature15394
|View full text |Cite
|
Sign up to set email alerts
|

An integrated map of structural variation in 2,504 human genomes

Abstract: Summary Structural variants (SVs) are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight SV classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype-blocks in 26 human populations. Analyzing this set, we identify numerous gene-intersecting SVs exhibiting population stratification and describe naturally occurring homozygo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

124
2,439
12
9

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 2,155 publications
(2,669 citation statements)
references
References 40 publications
124
2,439
12
9
Order By: Relevance
“…Compared to array-based data, which commonly serve as inputs for copy-number significance analysis, sequencing-based copy-number profiles are more prone to artefact copy-number variations, for example, due to repetitive regions leading to ambiguous alignments. Thus, several filtering steps were used to eliminate false-positive GISTIC peak calls and to discover potentially cancerrelevant copy-number alterations: first, peaks overlapping with common fragile genomic sites were excluded, as these are likely to be consequences of genomic instability rather than cancer-driving events 97 ; next, peaks overlapping within 1 Mb of chromosomal ends were removed, as here sequencing coverage tends to vary frequently; and last, peaks overlapping with copy-number variable regions 98 (regions ranked 1-100) were excluded. Additionally, some of the resulting peaks were classified as 'passengers' of variable regions that were called as separated peaks from most likely one event, for example, a peak with MYCNOS as passenger peak of MYCN amplification.…”
Section: Discussionmentioning
confidence: 99%
“…Compared to array-based data, which commonly serve as inputs for copy-number significance analysis, sequencing-based copy-number profiles are more prone to artefact copy-number variations, for example, due to repetitive regions leading to ambiguous alignments. Thus, several filtering steps were used to eliminate false-positive GISTIC peak calls and to discover potentially cancerrelevant copy-number alterations: first, peaks overlapping with common fragile genomic sites were excluded, as these are likely to be consequences of genomic instability rather than cancer-driving events 97 ; next, peaks overlapping within 1 Mb of chromosomal ends were removed, as here sequencing coverage tends to vary frequently; and last, peaks overlapping with copy-number variable regions 98 (regions ranked 1-100) were excluded. Additionally, some of the resulting peaks were classified as 'passengers' of variable regions that were called as separated peaks from most likely one event, for example, a peak with MYCNOS as passenger peak of MYCN amplification.…”
Section: Discussionmentioning
confidence: 99%
“…We were able to validate 271 out of 276 SVs with BAC contigs generated by SMRT sequencing (Supplementary Table 12). Compared to previous studies 6,[8][9][10][11] , a total of 11,927 variants were previously unreported, which account for approximately 47% (3,465) and 76% (7,710) of all deletions and insertions, respectively ( Fig. 2a and Extended Data Fig.…”
mentioning
confidence: 62%
“…Studies of L1s in the whole genome sequencing data of phase 1 of the 1000 genomes project have been published (Ewing & Kazazian, 2011; Stewart et al., 2011), and the L1s detected in these publications were annotated as known non‐reference L1s at the time of our L1‐seq analyses. Subsequently, in response to reviewer comments, we cross‐referenced our list of detected L1s with the 2015 publication on structural variants in the 1000 genomes project (Sudmant et al., 2015). Although the SYBU , DAB1, KLHL1, TBCK, PTHR2, and MACROD2 L1s were found in the 1000 genomes project, the TET2, WBSCR17, ATXN1, CTCF, DDX58, and DACH2 L1s confirmed in our study were not among those in the phase 3 data of the 1000 genomes project.…”
Section: Resultsmentioning
confidence: 99%
“…Thus, 100% of neurons and glia, heterozygous for this novel L1, are likely affected. The SYBU L1 was not detected in gDNA samples from the blood of 84 individuals of European or African descent (data not shown), but was subsequently found in the phase 3 dataset of structural variants of the 1000 genomes project (Sudmant et al., 2015) at very low minor allele frequencies (≤1%) in 2 African (GDW, MSL) and 1 European (IBS) population(s). In contrast, the TET2 and WBSCR17 L1s may be private mutations because they were found in one individual and were not found among the L1s in the phase 3 data set of the 1000 genomes project (Sudmant et al., 2015).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation