2017
DOI: 10.1093/bioinformatics/btx106
|View full text |Cite
|
Sign up to set email alerts
|

Pseudoalignment for metagenomic read assignment

Abstract: Abstract. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data. In particular, we show that the recent idea of pseudoalignment introduced in the RNA-Seq context is suitable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
110
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 105 publications
(111 citation statements)
references
References 32 publications
1
110
0
Order By: Relevance
“…Therefore, any assumption that Kraken's raw read assignments can be directly translated into species-or strain-level abundance estimates (e.g., Schaeffer et al, 2015) is flawed, as ignoring reads at higher levels of the taxonomy will grossly underestimate some species, and creates the erroneous impression that Kraken's assignments themselves were incorrect.…”
Section: Classification Versus Abundance Estimationmentioning
confidence: 99%
“…Therefore, any assumption that Kraken's raw read assignments can be directly translated into species-or strain-level abundance estimates (e.g., Schaeffer et al, 2015) is flawed, as ignoring reads at higher levels of the taxonomy will grossly underestimate some species, and creates the erroneous impression that Kraken's assignments themselves were incorrect.…”
Section: Classification Versus Abundance Estimationmentioning
confidence: 99%
“…We compared the performance of mSWEEP against two existing methods capable of either strain or lineage identification: metakallisto [11] and BIB [7]. The main differences between the methods are that metakallisto attempts to identify individual strains based on all available sequences, BIB uses grouped reference sequences with a single representative sequence from each group to assign abundances to the groups, and mSWEEP identifies the presence of groups by using grouped reference sequences with all the available sequence as representatives.…”
Section: Assigning Single-colony Isolates To Lineagementioning
confidence: 99%
“…They are very efficient but limited in terms of sensitivity. Metakallisto (27), a pseudo-alignment technique for metagenomic taxonomic classification, relies on the idea of approximate k-mer matching via a k-mer de Bruijn graph -an idea originally introduced by Kallisto (28) for quantifying transcript abundances in RNAseq data. Other approaches rely on supervised machine learning (ML) classifiers, such as, Naïve Bayes (NB) or support vector machines (SVM), trained on a set of reference sequences to classify taxonomic origins of metagenomic reads using their relative k-mer frequency vectors (29; 30).…”
Section: Introductionmentioning
confidence: 99%