2022
DOI: 10.1101/2022.05.19.492613
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Succinct k-mer Sets Using Subset Rank Queries on the Spectral Burrows-Wheeler Transform *

Abstract: The k-spectrum of a string is the set of all distinct substrings of length k occurring in the string. This is a lossy but computationally convenient representation of the information in the string, with many applications in high-throughput bioinformatics. In this work, we define the notion of the Spectral Burrows-Wheeler Transform (SBWT), which is a sequence of subsets of the alphabet of the string encoding the k-spectrum of the string. The SBWT is a distillation of the ideas found in the BOSS and Wheeler grap… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(35 citation statements)
references
References 50 publications
0
35
0
Order By: Relevance
“…When built over an SPSS, stores the k -mers by their order of appearance in the strings (which we term tiles) of an SPSS and thus allows easy computation of a k -mer’s offset into a tile. Other methods based on the Burrows-Wheeler transform (BWT) [8], such as the Spectral BWT [25] and BOSS [27], could also be used. However, these methods implicitly sort k -mers in lexicographical order and would likely need an extra level of indirection to implement .…”
Section: Spectrum Preserving Tilingsmentioning
confidence: 99%
See 1 more Smart Citation
“…When built over an SPSS, stores the k -mers by their order of appearance in the strings (which we term tiles) of an SPSS and thus allows easy computation of a k -mer’s offset into a tile. Other methods based on the Burrows-Wheeler transform (BWT) [8], such as the Spectral BWT [25] and BOSS [27], could also be used. However, these methods implicitly sort k -mers in lexicographical order and would likely need an extra level of indirection to implement .…”
Section: Spectrum Preserving Tilingsmentioning
confidence: 99%
“…Karasikov et al developed the Counting dBG [19] that stores differences between adjacent nodes in the dBG to compress metadata associated with nodes (and sequences) in a dBG. Encouragingly, much recent work on Spectrum Preserving String Sets (SPSS) that compactly index the set-membership of k -mers in reference texts has been introduced [20,21,22,17,23,24,25]. Although these approaches do not tackle the locate queries directly, they do suggest that even more efficient solutions for reference indexing are possible.…”
Section: Introductionmentioning
confidence: 99%
“…The first part is a succinct index data structure that answers whether a query k -mer is found in D and, if so, returns an integer identifier for the k -mer. This part of the index is implemented using the Spectral Burrow-Wheeler transform framework (Alanko et al ., 2022), where the integer identifiers of the k -mers are the colexicographic ranks of the k -mers within the index. The second part is a succinct data structure that takes the identifier of the k -mer x from the first structure and uses it to retrieve the set of identifiers of the reference sequences that contain x .…”
Section: Methodsmentioning
confidence: 99%
“…A 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 C 1 1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 T 0 0 1 1 0 0 1 1 1 1 1 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 indicates that the j-th k-mer has on outgoing edge such that the last character of the edge (k + 1)-mer is the i-th character of the alphabet. See Alanko et al (2022) for a more in-depth explanation. The columns shaded in gray are the core k-mers, which are also marked in the bit vector below the SBWT matrix.…”
Section: T T T T T T T T T Tmentioning
confidence: 99%
See 1 more Smart Citation