2018
DOI: 10.1089/cmb.2017.0258
|View full text |Cite
|
Sign up to set email alerts
|

AllSome Sequence Bloom Trees

Abstract: The ubiquity of next-generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2652 human RNA-seq experiments uploaded to the Sequence Read Archive (SRA). Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this article, we propos… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
22
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 32 publications
(22 citation statements)
references
References 38 publications
0
22
0
Order By: Relevance
“…The resulting structure is usually referred to as a colored de Bruijn graph [19] and its representations have been widely studied ( [50][51][52][53][54][55][56][57][58][59][60][61] ). Even though we touched this setting in the section Multiple pan-genomes, exploiting the similarity between individual de Bruijn graphs for further compression in simplitig-based approaches is to be addressed in future work.…”
Section: Discussionmentioning
confidence: 99%
“…The resulting structure is usually referred to as a colored de Bruijn graph [19] and its representations have been widely studied ( [50][51][52][53][54][55][56][57][58][59][60][61] ). Even though we touched this setting in the section Multiple pan-genomes, exploiting the similarity between individual de Bruijn graphs for further compression in simplitig-based approaches is to be addressed in future work.…”
Section: Discussionmentioning
confidence: 99%
“…Though inspired by the SBT and subsequent work, Mantis takes a completely different approach to this problem. Specifically, rather than adopting a hierarchy of Bloom filters, as suggested by previous approaches Kingsford, 2016, 2017;Sun et al, 2017), we build our system on top of the CQF (Pandey et al, 2017b), using this data structure both for counting and as a general key-value store. We combine this data structure with a color-encoding scheme similar to that adopted by Holley et al (2016) and Almodaresi et al (2017) for colored de Bruijn graph representation.…”
Section: Discussionmentioning
confidence: 99%
“…The resulting problem is coined as the experiment discovery problem, where the goal is to return all experiments that contain at least some user-defined q fraction of the k-mers present in the query string. The space and query time of the SBT structure has been further improved by Solomon and Kingsford (2017) and Sun et al (2017) by applying an All-Some set decomposition over the original sets of the SBT structure. This seminal work introduced both a formulation of this problem and the initial steps toward a solution.…”
Section: Introductionmentioning
confidence: 99%
“…Those downsides are intensified in the colored de Bruijn graph for which the memory consumption of colors rapidly overtakes the vertices and edges memory usage [36]. For this reason, a lot of attention has been given to succinct data structures for building the colored de Bruijn graph [30,31,[36][37][38][39][40][41] and data structures for multi-set k-mer indexing [42][43][44][45][46][47]. In the following, we focus on tools for constructing compacted de Bruijn graphs (cdBGs) with or without colors.…”
Section: Introductionmentioning
confidence: 99%