2010
DOI: 10.1186/1471-2105-11-187
|View full text |Cite
|
Sign up to set email alerts
|

Artificial and natural duplicates in pyrosequencing reads of metagenomic data

Abstract: BackgroundArtificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates.ResultsWe implemented a method for identification of exact and n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
228
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 248 publications
(233 citation statements)
references
References 22 publications
5
228
0
Order By: Relevance
“…Nevertheless, it is critical to differentiate minor variants from technical errors associated with library preparation and the sequencing process. By sequencing a clonal fragment, we determined for the 454 platform and the Illumina platform error rates in good accordance with data from previous studies (39,41,43,45,55). Although specialized software for additional error correction exists (33,40), we refrained from employing it in this setting due to the risk of discarding genuine minority variants.…”
Section: Discussionsupporting
confidence: 53%
See 1 more Smart Citation
“…Nevertheless, it is critical to differentiate minor variants from technical errors associated with library preparation and the sequencing process. By sequencing a clonal fragment, we determined for the 454 platform and the Illumina platform error rates in good accordance with data from previous studies (39,41,43,45,55). Although specialized software for additional error correction exists (33,40), we refrained from employing it in this setting due to the risk of discarding genuine minority variants.…”
Section: Discussionsupporting
confidence: 53%
“…The insert of one sequenced plasmid clone was excised from the vector with the restriction enzyme EcoRI and was deep sequenced in parallel with the other PCR amplicon samples. In accordance with other studies (33,(39)(40)(41)(42)(43), the technical error rate was calculated by counting all nucleotide variants of the plasmid reads in the alignment that did not correspond to the sequence of the clone determined by Sanger sequencing. While insertion errors were subject to automatic removal during the mapping of the sequencing reads to the HCV-J reference genome (34), deletions with respect to the reference sequence were detected during the mapping and quantified as errors but excluded from all further analyses.…”
Section: Methodsmentioning
confidence: 99%
“…After mis-assigned reads were removed, the remaining Illumina reads were assembled using SPAdes assembler version 2.3.0 (Bankevich et al, 2012) (parameters: -k 21,33,45 -sc). The 454-pyrosequence reads were dereplicated using cd-hit-454 (Niu et al, 2010) with a 98% similarity cutoff and assembled using GS De Novo Assembler version 2.6 (gsAssembler, Roche) (parameters: -mi 98 -ml 50). The two assemblies were finally combined using Sequencher version 5.0.1 (Genecodes) (Lloyd et al, 2013).…”
Section: Methodsmentioning
confidence: 99%
“…Error rates averaged 0.1%, lower than some others NGS studies using 454 sequencing (Niu et al, 2010), which might be attributed to the fact that called alleles went through a multi-level process of screening and error filtering in the analytical pipeline. Of the few errors, most involved calling individuals homozygous when they were actually heterozygotes.…”
Section: Snp Validationmentioning
confidence: 96%