2021
DOI: 10.1109/tcbb.2019.2913932
|View full text |Cite
|
Sign up to set email alerts
|

deGSM: Memory Scalable Construction Of Large Scale de Bruijn Graph

Abstract: Motivation: De Bruijn graph, a fundamental data structure to represent and organize genome sequence, plays important roles in various kinds of sequence analysis tasks such as de novo assembly, high-throughput sequencing (HTS) read alignment, pan-genome analysis, metagenomics analysis, HTS read correction, etc. With the rapid development of HTS data and ever-increasing number of assembled genomes, there is a high demand to construct de Bruijn graph for sequences up to Tera-basepair level. It is non-trivial sinc… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(18 citation statements)
references
References 43 publications
0
18
0
Order By: Relevance
“…As a downside, the naive implementation of the heuristic using a standard hashtable may run into memory issues. In our work, we have not encountered this, but memory consumption can be readily improved using more advanced data structures, similarly to what has been done for tools for unitig computation [33,46,47] . We note that ProphAsm is a spin-off of the ProPhyle software ( https://prophyle.github.io/ , [27] ) for phylogeny-based metagenomic classification.…”
Section: Discussionmentioning
confidence: 96%
“…As a downside, the naive implementation of the heuristic using a standard hashtable may run into memory issues. In our work, we have not encountered this, but memory consumption can be readily improved using more advanced data structures, similarly to what has been done for tools for unitig computation [33,46,47] . We note that ProphAsm is a spin-off of the ProPhyle software ( https://prophyle.github.io/ , [27] ) for phylogeny-based metagenomic classification.…”
Section: Discussionmentioning
confidence: 96%
“…Note that BCALM2 can process assembled genomes as well as short read data. deGSM [50] performs an external sorting of the k-mers from the input sequences and then constructs a Burrows-Wheeler transform (BWT) [51] of the unitigs from which the final graph is extracted. SplitMEM [30] uses the suffix tree [52] to construct a ccdBG.…”
Section: Introductionmentioning
confidence: 99%
“…The maximal unitigs U can be computed efficiently [12][13][14] and combined with an auxiliary index to obtain a membership data structure (i.e. one that can efficiently determine if a k-mer belongs to K or not).…”
Section: Introductionmentioning
confidence: 99%
“…one that can efficiently determine if a k-mer belongs to K or not). In particular, Unitigs-FM [11] and deGSM [14] uses the FM-index as the auxiliary index, Pufferfish [15] and BLight [16] uses a minimum perfect hash function, and Bifrost [17] uses a minimizer hash table. Alternatively, U can be compressed to obtain a compressed disk representation of K, albeit without efficient support for membership queries prior to decompression.…”
Section: Introductionmentioning
confidence: 99%