2023
DOI: 10.1101/gr.277615.122
|View full text |Cite
|
Sign up to set email alerts
|

Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT

Abstract: Compacted de Bruijn graphs are one of the most fundamental data structures in computational genomics. Colored compacted de Bruijn graphs are a variant built on acollectionof sequences, and associate to eachk-mer the sequences in which it appears. We present GGCAT, a tool for constructing both types of graphs, based on a new approach merging thek-mer counting step with the unitig construction step, and on numerous practical optimizations. For compacted de Bruijn graph construction, GGCAT achieves speed-ups of 3… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(17 citation statements)
references
References 45 publications
0
17
0
Order By: Relevance
“…We evaluated the performances of kmindex together with eight state-of-the-art k-mer indexers: themisto [2]; ggcat [7]; HIBF [18]; PAC [17]; MetaProFi [22]; MetaGraph [13]; Bifrost [12]; and COBS [3]. The dataset for this benchmark is composed of metagenomic seawater sequencing data from 50 Tara Oceans samples, of 1.4TB of gzipped fastq files.…”
Section: Comparative Results Indexing 50 Metagenomic Seawater Samplesmentioning
confidence: 99%
“…We evaluated the performances of kmindex together with eight state-of-the-art k-mer indexers: themisto [2]; ggcat [7]; HIBF [18]; PAC [17]; MetaProFi [22]; MetaGraph [13]; Bifrost [12]; and COBS [3]. The dataset for this benchmark is composed of metagenomic seawater sequencing data from 50 Tara Oceans samples, of 1.4TB of gzipped fastq files.…”
Section: Comparative Results Indexing 50 Metagenomic Seawater Samplesmentioning
confidence: 99%
“…The task of counting maximal unitigs for all k values takes O(|V U |) time when using a Prokrustean graph (section 2.2). We compare with GGCAT [8], an efficient compacted de Bruijn graph generating algorithm. 3: Counting maximal unitigs with a range of k. The Prokrustean approach is orders of magnitude faster than GGCAT, which is not designed for this task and was called each k separately to compute de Bruijn graphs.…”
Section: Application: Counting Maximal Unitigs Of De Bruijn Graphs Fo...mentioning
confidence: 99%
“…Fig. 1: The performance of Prokrustean graphs for two representative functionalities were compared with state-of-thearts: KMC [15] is a k-mer counting tool and GGCAT [8] constructs compacted de Bruijn graphs . Both were iterative called to extract k-mer/unitig counts of k = 30 .…”
Section: Introductionmentioning
confidence: 99%
“…It included assemblies of over 300,000 genomes which had not previously been available (the raw data only had been available). The assemblies and search indexes allowed multiple other studies of plasmids [5,6], bacterial adaptation [7,8,9,10], and compression/indexing algorithms [11,12,13,14,15]. However, there were a few limitations.…”
Section: Introductionmentioning
confidence: 99%