2011
DOI: 10.1038/npre.2011.5989.1
|View full text |Cite
|
Sign up to set email alerts
|

Identification and correction of systematic error in high-throughput sequence data

Abstract: A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specifi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
30
1
1

Year Published

2012
2012
2020
2020

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(32 citation statements)
references
References 10 publications
0
30
1
1
Order By: Relevance
“…Initial benchmarking growth assays were performed with strains and in contrast to previous reports [15,32,33], we found that Y. lipolytica strains PO1f and E26 were unable to grow in minimal media with xylose as the only carbon source. This result is consistent with additional reports in literature [34,35].…”
Section: Resultscontrasting
confidence: 79%
See 1 more Smart Citation
“…Initial benchmarking growth assays were performed with strains and in contrast to previous reports [15,32,33], we found that Y. lipolytica strains PO1f and E26 were unable to grow in minimal media with xylose as the only carbon source. This result is consistent with additional reports in literature [34,35].…”
Section: Resultscontrasting
confidence: 79%
“…The HiSeq generated 25 337 178 read pairs that covered the Y. lipolytica genome approximately 225 times. The sequencing results were mapped to the CLIB122 genome using BWA; and Samtools, BEDTools and IGV were used to analyze the data [30][31][32][33].…”
Section: Bioinformaticsmentioning
confidence: 99%
“…Among the challenges of correctly separating the true mutations from sequencing related errors is the presence of the non-uniform error rates in the sequencing data [15,[20][21][22]. We have shown that even at Q30, ORP mismatch rates have a small but significant distribution ( Figure 2).…”
Section: Discussionmentioning
confidence: 93%
“…Their approach relied on an estimated error rate from the sequencer without the use of sequencing controls and included sequencing each sample twice to correct for sequencing error, which would present practical problems for sequencing larger numbers of samples collected from an outbreak. Moreover, recent work has indicated the presence of non-uniform error rates in Illumina sequence data in particular, and highlights the ongoing challenge of correctly separating the true mutant spectra from sequencing related errors [15,[22][23][24].…”
Section: Introductionmentioning
confidence: 99%
“…; Meacham et al . ; Minoche, Dohm & Himmelbauer ; Quince et al . ; Benjamini & Speed ; Victoria et al .…”
Section: Methodsunclassified