2021
DOI: 10.1101/2021.02.16.429304
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

kmtricks: Efficient and flexible construction of Bloom filters for large sequencing data collections

Abstract: When indexing large collection of sequencing data, a common operation that has now been implemented in several tools (Sequence Bloom Trees and variants, BIGSI, ..) is to construct a collection of Bloom filters, one per sample. Each Bloom filter is used to represent a set of k-mers which approximates the desired set of all the non-erroneous k-mers present in the sample. However, this approximation is imperfect, especially in the case of metagenomics data. Erroneous but abundant k-mers are wrongly included, and … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 11 publications
(11 citation statements)
references
References 41 publications
0
11
0
Order By: Relevance
“…MetaProFi was able to build k-mer indices for multiple datasets demonstrating it can scale in any direction; on the contrary, we failed to reproduce the reported results for other tools. For example, we attempted to construct kmtricks [8] Bloom filter of the Tara Oceans dataset three times using two different sets of parameters (see methods for details). The first attempt was terminated after 9 hrs due to the consumption of the entire local disk space of 2.9 TiB while using the default parameters.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…MetaProFi was able to build k-mer indices for multiple datasets demonstrating it can scale in any direction; on the contrary, we failed to reproduce the reported results for other tools. For example, we attempted to construct kmtricks [8] Bloom filter of the Tara Oceans dataset three times using two different sets of parameters (see methods for details). The first attempt was terminated after 9 hrs due to the consumption of the entire local disk space of 2.9 TiB while using the default parameters.…”
Section: Discussionmentioning
confidence: 99%
“…MetaProFi was able to build k-mer indices for multiple datasets demonstrating it can scale in any direction; on the contrary, we failed to reproduce the reported results for other tools. For example, we attempted to construct kmtricks [8] Bloom filter of the Tara Oceans dataset three times using two different sets of…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…We used kmtricks [27] for an optimized construction of HowDeSBT and its Bloom filters (commit number 532d545). SeqOthello (commit 68d47e0) uses Jellyfish [28] for its pre-processing, we worked with version 2.3.0.…”
Section: Preamblementioning
confidence: 99%