2024
DOI: 10.1101/2024.01.29.577700
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Conway-Bromage-Lyndon (CBL): an exact, dynamic representation ofk-mer sets

Igor Martayan,
Bastien Cazaux,
Antoine Limasset
et al.

Abstract: In this paper, we introduce the Conway-Bromage-Lyndon (CBL) structure, a compressed, dynamic and exact method for representing k-mer sets. Originating from Conway and Bromage's concept, CBL innovatively employs the smallest cyclic rotations of k-mers, akin to Lyndon words, to leverage lexicographic redundancies. In order to support dynamic operations and set operations, we propose a dynamic bit vector structure that draws a parallel with Elias-Fano's scheme. This structure is encapsulated in a Rust library, de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
8
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 55 publications
0
8
0
Order By: Relevance
“…Experimental evaluation of indexing. We compared time and memory requirements for processing both positive and negative queries of FMSI to state-of-the-art programs for indexing individual k -mer sets, namely, BWA 1 [38], a state-of-the-art aligner based on the FM index; for processing queries, we used the fastmap command [36], run with parameter w = 999999 on the simplitigs computed by ProphAsm [11], SBWT 2 [1], an index based on the spectral Burrows-Wheeler transform; we used the default plain-matrix variant as it achieves the best query times in [1] and added all reverse complements to the index, and CBL 3 [45], a very recent method based on smallest cyclic rotations of k -mers. We have run FMSI on the masked superstrings computed by KmerCamel [61], specifically the global and local greedy algorithms (local is run with d = 1).…”
Section: Experimental Evaluationmentioning
confidence: 99%
See 3 more Smart Citations
“…Experimental evaluation of indexing. We compared time and memory requirements for processing both positive and negative queries of FMSI to state-of-the-art programs for indexing individual k -mer sets, namely, BWA 1 [38], a state-of-the-art aligner based on the FM index; for processing queries, we used the fastmap command [36], run with parameter w = 999999 on the simplitigs computed by ProphAsm [11], SBWT 2 [1], an index based on the spectral Burrows-Wheeler transform; we used the default plain-matrix variant as it achieves the best query times in [1] and added all reverse complements to the index, and CBL 3 [45], a very recent method based on smallest cyclic rotations of k -mers. We have run FMSI on the masked superstrings computed by KmerCamel [61], specifically the global and local greedy algorithms (local is run with d = 1).…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…We compared FMSI against state-of-the-art single 𝑘-mer set indexes such as BWA [38], SBWT [1], SSHash [52], and CBL [46] and evaluated them on E. coli pan-genome (obtained as a 𝑘-mer union over E. coli genomes from the Fig. 1: Time and memory efficiency of 𝑘-mer membership queries.…”
Section: Implementation and Experimental Evaluationmentioning
confidence: 99%
See 2 more Smart Citations
“…We demonstrate how to construct subsets of k -mers using these tags and how to simplify query operations to reduce computational delays. We implement a solution using a recent k -mer set data-structure that was tailored for set operations [11]. We demonstrate that our method is capable of handling thousands of datasets efficiently.…”
Section: Introductionmentioning
confidence: 99%