2014
DOI: 10.1038/nature13907
|View full text |Cite
|
Sign up to set email alerts
|

Resolving the complexity of the human genome using single-molecule sequencing

Abstract: The human genome is arguably the most complete mammalian reference assembly1–3 yet more than 160 euchromatic gaps remain4–6 and aspects of its structural variation remain poorly understood ten years after its completion7–9. In order to identify missing sequence and genetic variation, we sequenced and analyzed a haploid human genome (CHM1) using single-molecule, real-time (SMRT) DNA sequencing10. We closed or extended 55% of the remaining interstitial gaps in the human GRCh37 reference genome—78% of which carri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

28
816
1
7

Year Published

2015
2015
2023
2023

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 756 publications
(852 citation statements)
references
References 26 publications
28
816
1
7
Order By: Relevance
“…We also extended into 72 of the 85 remaining gaps with the addition of 663 kb of sequence into 4.1 Mb. These locations, previously intractable using only short reads, commonly contained simple tandem repeats, as reported previously 6,9 . One example (Fig.…”
mentioning
confidence: 56%
See 1 more Smart Citation
“…We also extended into 72 of the 85 remaining gaps with the addition of 663 kb of sequence into 4.1 Mb. These locations, previously intractable using only short reads, commonly contained simple tandem repeats, as reported previously 6,9 . One example (Fig.…”
mentioning
confidence: 56%
“…We were able to validate 271 out of 276 SVs with BAC contigs generated by SMRT sequencing (Supplementary Table 12). Compared to previous studies 6,[8][9][10][11] , a total of 11,927 variants were previously unreported, which account for approximately 47% (3,465) and 76% (7,710) of all deletions and insertions, respectively ( Fig. 2a and Extended Data Fig.…”
mentioning
confidence: 66%
“…Sensitivity also varies as a function of size, with both ends of the SV spectrum adversely affected. Comparisons with SVs resolved using long-read sequencing technologies [e.g., single-molecule, real-time (SMRT) or Pacific Biosciences sequencing technology] suggest that the majority (.80%) of insertions and deletions between 50 bp and 1 kbp in length are missed using short-read sequencing technologies (Figure 1) (Chaisson et al 2015a), irrespective of frequency. These results argue that most widely used sequencing technologies are insufficient, because short reads fail to detect and accurately genotype a large fraction of SVs.…”
mentioning
confidence: 99%
“…Detailed targeted sequencing of regions of the human genome suggests that indels should occur at approximately one-tenth of the frequency of SNVs (Bhangale et al 2005), suggesting that the current catalog may be missing at least 30-40% of all indels. Detection of indels associated with short tandem repeat (STR) sequences is particularly challenging and specialized methods have been developed to discover and accurately genotype these from next-generation sequencing datasets (Karakoc et al 2012;Narzisi et al 2014;Willems et al 2014;Chaisson et al 2015a). Sensitivity for indel variant discovery is generally much lower than for SNVs.…”
mentioning
confidence: 99%
See 1 more Smart Citation