2022
DOI: 10.1101/2022.10.24.513174
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Extremely-fast construction and querying of compacted and colored de Bruijn graphs with GGCAT

Abstract: Compacted de Bruijn graphs are one of the most fundamental data structures in computational genomics. Colored compacted graphs Bruijn graphs are a variant built on a collection of sequences, and associate to each k-mer the sequences in which it appears. Here we present GGCAT, a tool for constructing both types of graphs. Compared to Cuttlefish 2 (Genome Biology, 2022), the state-of-the-art for constructing compacted de Bruijn graphs, GGCAT has a speedup of up to 3.4x for k = 63 and up to 20.8x for k = 255. Com… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
29
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 15 publications
(29 citation statements)
references
References 49 publications
0
29
0
Order By: Relevance
“…Kmer-based methods have found wide-spread use in many areas of bioinformatics over the past years. However, they usually rely on unitigs to represent the kmer sets, since they can be computed efficiently with standard tools [33, 23, 40, 41]. Unitigs have the additional property that the de Bruijn graph topology can easily be reconstructed from them, since they do not contain branching nodes other than on their first and last kmer.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…Kmer-based methods have found wide-spread use in many areas of bioinformatics over the past years. However, they usually rely on unitigs to represent the kmer sets, since they can be computed efficiently with standard tools [33, 23, 40, 41]. Unitigs have the additional property that the de Bruijn graph topology can easily be reconstructed from them, since they do not contain branching nodes other than on their first and last kmer.…”
Section: Discussionmentioning
confidence: 99%
“…In that work, the size of the SPSS is very minor compared to the size of the index, however, major components of the index may be smaller if the SPSS contains less strings, which can be achieved by using greedy matchtigs. Our algorithms were also integrated into the external-memory de Bruijn graph compactor GGCAT [41], which was easy to do [5] . [4] While this paper was under review, Schmidt and Alanko realised that the algorithm to compute matchtigs can also be used to compute optimal simplitigs, by leaving out all parts related to repeating kmers.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…We show that our approach is ideal for pseudoaligning Nanopore long-read sequencing data, where the previous methods struggle, while simultaneously achieving rapid query times and small index size. Our implementation also provides an efficient way to construct the index making use of recent advances on colored unitig extraction algorithms (Cracco and Tomescu, 2022) and is an order of magnitude faster than Bifrost and Metagraph for reference databases containing 100,000 or more bacterial genomes. These factors enable Themisto to leverage much larger databases than previous methods, thus representing a significant methodological advance in pseudoalignment.…”
Section: Introductionmentioning
confidence: 99%