2014
DOI: 10.1101/gr.168393.113
|View full text |Cite
|
Sign up to set email alerts
|

Estimating genotype error rates from high-coverage next-generation sequence data

Abstract: Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

13
106
0
1

Year Published

2015
2015
2021
2021

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 130 publications
(121 citation statements)
references
References 25 publications
13
106
0
1
Order By: Relevance
“…Although the error rate estimates in this region were not very different from those in the other regions ( Figure S5C), depths of coverage in this region were somewhat higher compared to those in the others (Figure 7, B and D), raising the possibility of misassembly and subsequent mismapping. In addition, the overall error rate estimates were positively correlated with depths of coverage outside this region, which is consistent with recent findings that error rates in genotype calling increased with higher depths of coverage with Illumina sequencing data (Wall et al 2014).…”
Section: Resultssupporting
confidence: 90%
“…Although the error rate estimates in this region were not very different from those in the other regions ( Figure S5C), depths of coverage in this region were somewhat higher compared to those in the others (Figure 7, B and D), raising the possibility of misassembly and subsequent mismapping. In addition, the overall error rate estimates were positively correlated with depths of coverage outside this region, which is consistent with recent findings that error rates in genotype calling increased with higher depths of coverage with Illumina sequencing data (Wall et al 2014).…”
Section: Resultssupporting
confidence: 90%
“…44,92 Studies have shown that systematic errors lead to a 4% to 6% error rate; counterintuitively, the rate is higher with increasing coverage. 93 Systematic errors may be sequencespecific errors, errors at a particular location of the read (eg, the ends for Illumina sequencers), or related to the base pair content (GC rich for Illumina). [93][94][95] As neither PCR nor fixation causes insertions/deletions (indels), outside of repeat regions there is better sensitivity for detecting small indels than SNVs.…”
Section: Analytic Sensitivitymentioning
confidence: 99%
“…93 Systematic errors may be sequencespecific errors, errors at a particular location of the read (eg, the ends for Illumina sequencers), or related to the base pair content (GC rich for Illumina). [93][94][95] As neither PCR nor fixation causes insertions/deletions (indels), outside of repeat regions there is better sensitivity for detecting small indels than SNVs. 81 There are 2 main methods for improving sensitivity; however, both of these methods decrease the number of useable reads and therefore will increase the sequencing cost to obtain a comparable coverage.…”
Section: Analytic Sensitivitymentioning
confidence: 99%
“…Between ~197,000 (Sponge B) and ~398,000 (Sponge D) total potential variant sites were detected in each dataset; this disparity is a direct consequence of the differences in sequencing depth between individuals (Table 4. significantly increase genomic coverage and, therefore, the number of detected polymorphisms. The sequencing error rate is expected to be 0.1% based on the Illumina HiSeq 2000 specifications in the year 2012 (Glenn 2011), although this value is likely to be an underestimate of the actual error rate (Wall et al 2014). Filtering of low-frequency nucleotide differences was performed prior to analysis, reducing the expected number of false positive nucleotide variants.…”
Section: Detection Of Transcriptome-wide Nucleotide Variantsmentioning
confidence: 99%