2019
DOI: 10.1101/695338
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bifrost – Highly parallel construction and indexing of colored and compacted de Bruijn graphs

Abstract: Motivation: De Bruijn graphs are the core data structure for a wide range of assemblers and genome analysis software processing High Throughput Sequencing datasets. For population genomic analysis, the colored de Bruijn graph is often used in order to take advantage of the massive sets of sequenced genomes available for each species. However, memory consumption of tools based on the de Bruijn graph is often prohibitive, due to the high number of vertices, edges or colors in the graph. In order to process large… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 23 publications
(28 citation statements)
references
References 66 publications
0
28
0
Order By: Relevance
“…In fact, for any concrete application, one might argue that a SPSS representation is too restrictive and can be improved. However, we chose to focus on SPSS representations because they are the common denominator in the applications of unitigbased representations we have observed [11,[15][16][17]. In this way, they retain broad applicability, as opposed to more specialized representations.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…In fact, for any concrete application, one might argue that a SPSS representation is too restrictive and can be improved. However, we chose to focus on SPSS representations because they are the common denominator in the applications of unitigbased representations we have observed [11,[15][16][17]. In this way, they retain broad applicability, as opposed to more specialized representations.…”
Section: Resultsmentioning
confidence: 99%
“…We did not compare against the Bloom filter trie [41], which is fast but uses an order of magnitude more memory than BOSS [40]. Other data structures, such as Pufferfish [15], blight [16], and Bifrost [17], implement more sophisticated operations and hence use significantly more memory than BOSS. Moreover, these make use of a unitig SPSS representation and hence could potentially themselves incorporate the UST approach.…”
Section: Evaluation Of Ust-fmmentioning
confidence: 99%
See 1 more Smart Citation
“…kallisto have a heuristic based on the dBG to avoid looking up every k-mer, which we have not reimplemented as of now. We have also experimented with another library to build the dBG called bifrost (Holley, 2019), which is slightly faster, presumably because of the rolling hash they use to lookup k-mers. The built dBGs were essentially the same and therefore the classifying performance of Brume was unchanged.…”
Section: Discussionmentioning
confidence: 99%
“…To that extent, we implemented a user-friendly library along with different snippets to allow our method to be usable in practical cases. The challenge of indexing colored de Bruijn graphs [36] (or more generally to answer large sequence search problems as defined in [10]) have caught the interest of a community and could be a direct application of this work. As an example, BLight is successfully integrated as an indexing structure in REINDEER [34], a k-mer data structure that enables the quantification of query sequences in thousands of raw read samples.…”
Section: Discussionmentioning
confidence: 99%