2015
DOI: 10.1186/s12864-015-1419-2
|View full text |Cite
|
Sign up to set email alerts
|

CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers

Abstract: BackgroundThe problem of supervised DNA sequence classification arises in several fields of computational molecular biology. Although this problem has been extensively studied, it is still computationally challenging due to size of the datasets that modern sequencing technologies can produce.ResultsWe introduce Clark a novel approach to classify metagenomic reads at the species or genus level with high accuracy and high speed. Extensive experimental results on various metagenomic samples show that the classifi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

10
561
1
6

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 577 publications
(578 citation statements)
references
References 33 publications
10
561
1
6
Order By: Relevance
“…For contiguous k-mers, the classification precision increases as we increase k. However, the highest sensitivity occurs with somewhat shorter k-mers. Clark is more precise for long contiguous k-mers (e.g., k = 31), but the highest sensitivity occurs for k-mers of length 19-22 [19]. As a consequence, we considered here spaced seeds of length k = 31 and weight w = 22.…”
Section: Selection Of Optimal Spaced Seeds and Index Creationmentioning
confidence: 99%
See 4 more Smart Citations
“…For contiguous k-mers, the classification precision increases as we increase k. However, the highest sensitivity occurs with somewhat shorter k-mers. Clark is more precise for long contiguous k-mers (e.g., k = 31), but the highest sensitivity occurs for k-mers of length 19-22 [19]. As a consequence, we considered here spaced seeds of length k = 31 and weight w = 22.…”
Section: Selection Of Optimal Spaced Seeds and Index Creationmentioning
confidence: 99%
“…The classification algorithm of the "Spaced" mode is identical to that of the "full" mode (extensively described in [19]), except for two differences, namely (i) Clark-S queries against discriminative spaced k-mers instead of discriminative k-mers and (ii) Clark-S does three queries for each k-mer in a read, because there are three different databases. Finally, as done in the full and other modes, the read is assigned to the target that has the highest amount of successful queries, and several statistics (such as the confidence score and gamma score, see [19]) are computed as well.…”
Section: Selection Of Optimal Spaced Seeds and Index Creationmentioning
confidence: 99%
See 3 more Smart Citations