2020
DOI: 10.1101/2020.03.06.980979
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Embedding the de Bruijn graph, and applications to metagenomics

Abstract: Fast mapping of sequencing reads to taxonomic clades is a crucial step in metagenomics, which however raises computational challenges as the numbers of reads and of taxonomic clades increases. Besides alignment-based methods, which are accurate but computational costly, faster compositional approaches have recently been proposed to predict the taxonomic clade of a read based on the set of k-mers it contains. Machine learning-based compositional approaches, in particular, have recently reached accuracies simila… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 15 publications
0
7
0
Order By: Relevance
“…Reads falling in the same bucket share the same embedding. On the other hand, fastDNA has been enhanced with BRUME [ 127 ]. The idea here is that k-mers that are always present or absent together in the same reads should be considered as having the same importance in sequence embedding.…”
Section: Resultsmentioning
confidence: 99%
“…Reads falling in the same bucket share the same embedding. On the other hand, fastDNA has been enhanced with BRUME [ 127 ]. The idea here is that k-mers that are always present or absent together in the same reads should be considered as having the same importance in sequence embedding.…”
Section: Resultsmentioning
confidence: 99%
“…Prior work for using machine learning in taxonomic classification either relied on testing with simulated reads [14,15,30] or did not show advantages compared to conventional k-mer-matching and mapping-based approaches and real data [11,13]. MetageNN is, to the best of our knowledge, the first machine learning-based method that shows improvements relative to conventional tools with real long-read data, when assessing performance for genomes that are not in the database.…”
Section: Discussionmentioning
confidence: 99%
“…Prior work for using machine learning in taxonomic classification either relied on testing with simulated reads [15,31,14] or did not show advantages compared to conventional k-mer-matching and mapping-based approaches and real data [11,13]. MetageNN is, to the best of our knowledge, the first machine learning-based method that shows improvements relative to conventional tools with real long-read data, when assessing performance for genomes that are not in the database.…”
Section: Discussionmentioning
confidence: 99%