2010
DOI: 10.1038/nmeth.1451
|View full text |Cite
|
Sign up to set email alerts
|

Characterization of missing human genome sequences and copy-number polymorphic insertions

Abstract: The extent of human genomic structural variation suggests that there must be portions of the genome yet to be discovered, annotated and characterized at the sequence level. We present a resource and analysis of 2,363 novel insertion sequences corresponding to 720 genomic loci. We show that a substantial fraction of these sequences are either missing, fragmented or mis-assigned when compared to recent de novo sequence assemblies from short-read next-generation sequence data. We determine that 18–37% of these no… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
154
0

Year Published

2011
2011
2019
2019

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 128 publications
(157 citation statements)
references
References 25 publications
3
154
0
Order By: Relevance
“…91 A portion of these described haplotypes were also shown to represent 'common' alleles, highlighting the fact that the current human reference assembly represents the minor allele for many loci across the genome; a point that is particularly important if the assembly sequence represents an allelic deletion. 91 With respect to IGHV, it is interesting to point out that many of the identified alternate haplotypes known in the cluster also represent variants that are at higher frequencies than those found in the reference genome. 30,35 The recent development of whole-genome NGS and assembly has expanded our ability to identify and genotype SNPs and structural variants (insertions, deletions and inversions).…”
Section: Additional Pitfalls and Future Prospects In Ighv Geneticsmentioning
confidence: 98%
See 2 more Smart Citations
“…91 A portion of these described haplotypes were also shown to represent 'common' alleles, highlighting the fact that the current human reference assembly represents the minor allele for many loci across the genome; a point that is particularly important if the assembly sequence represents an allelic deletion. 91 With respect to IGHV, it is interesting to point out that many of the identified alternate haplotypes known in the cluster also represent variants that are at higher frequencies than those found in the reference genome. 30,35 The recent development of whole-genome NGS and assembly has expanded our ability to identify and genotype SNPs and structural variants (insertions, deletions and inversions).…”
Section: Additional Pitfalls and Future Prospects In Ighv Geneticsmentioning
confidence: 98%
“…Extensive efforts have been undertaken in recent years to better characterize genomic structural variants in such loci. [89][90][91] Many of these have been discovered and sequenced by fosmid-end mapping and the generation of complete sequence from 'discordant' clones, relative to the reference genome assembly.…”
Section: Additional Pitfalls and Future Prospects In Ighv Geneticsmentioning
confidence: 99%
See 1 more Smart Citation
“…Each pool consisted of only 30 fosmids to avoid ambiguities owing to segmented duplications in the assembly process. And each pool was sequenced independently to reduce the complexity of assembling the reads and problems caused by repetitive sequences 41 . Using the contigs assembled from fosmids, we validated 11 (73%) of the remaining 15 structural variations, which indicates that failure to amplify a variant by PCR is not necessarily indicative of a spurious structural variation call (Supplementary Table 2).…”
Section: Precision and Sensitivity Of Structural Variant Callsmentioning
confidence: 99%
“…Since the reference genome is predominantly of European ancestry [98,99,100], populations with non-European ancestry generally have more variation with respect to the reference genome than populations of European ancestry (see Table 14). Therefore, to interpret the results of this study, one might conclude that non-European populations have higher rates of sequencing error than European descent populations.…”
Section: Genomes Project Datamentioning
confidence: 99%