2019
DOI: 10.1186/s12859-019-2684-x
|View full text |Cite
|
Sign up to set email alerts
|

From trash to treasure: detecting unexpected contamination in unmapped NGS data

Abstract: Background Next Generation Sequencing (NGS) experiments produce millions of short sequences that, mapped to a reference genome, provide biological insights at genomic, transcriptomic and epigenomic level. Typically the amount of reads that correctly maps to the reference genome ranges between 70% and 90%, leaving in some cases a consistent fraction of unmapped sequences. This ’misalignment’ can be ascribed to low quality bases or sequence differences between the sample reads and the reference geno… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
58
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 71 publications
(61 citation statements)
references
References 64 publications
2
58
0
1
Order By: Relevance
“…The lower value can be explained by ~5% Kraken-II FP rate when run under its default setting. The upper value is consistent with ~10% contamination level, a scenario that can happen in real sequencing projects (Merchant, Wood, & Salzberg, 2014;Sangiovanni, Granata, Thind, & Guarracino, 2019).…”
Section: Usefulness Of Theoretical Modelssupporting
confidence: 79%
“…The lower value can be explained by ~5% Kraken-II FP rate when run under its default setting. The upper value is consistent with ~10% contamination level, a scenario that can happen in real sequencing projects (Merchant, Wood, & Salzberg, 2014;Sangiovanni, Granata, Thind, & Guarracino, 2019).…”
Section: Usefulness Of Theoretical Modelssupporting
confidence: 79%
“…This observation implies the presence of high sequence similarity at the species level. We calculated the ratios by running PathSeq [18], FastQ Screen [28], and DecontaMiner [29] (Additional file 2). Of note, comparing existing pipelines is not straightforward because different aligners are employed and databases are inaccessible in some cases.…”
Section: Resultsmentioning
confidence: 99%
“…The network data were visualized by using software Cytoscape (V.3.5.1). PathSeq [18], FastQ Screen [28], and DecontaMiner [29] were installed with their reference databases. Because FastQ Screen accepts limited number of genomes, the input reads were mapped to ten specific genomes only.…”
Section: Methodsmentioning
confidence: 99%
“…The network data were visualized by using software Cytoscape (V.3.5.1). PathSeq [18], FastQ Screen [28], and DecontaMiner [29] were installed with their reference databases. Because FastQ Screen accepts limited number of genomes, the input reads were mapped to 10 specific genomes only.…”
Section: Methodsmentioning
confidence: 99%