2022
DOI: 10.1101/gr.276607.122
|View full text |Cite
|
Sign up to set email alerts
|

Lossless indexing with counting de Bruijn graphs

Abstract: Sequencing data are rapidly accumulating in public repositories. Making this resource accessible for interactive analysis at scale requires efficient approaches for its storage and indexing. There have recently been remarkable advances in building compressed representations of annotated (or colored) de Bruijn graphs for efficiently indexing k-mer sets. However, approaches for representing quantitative attributes such as gene expression or genome positions in a general manner have remained underexplored. In thi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
45
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 19 publications
(45 citation statements)
references
References 51 publications
0
45
0
Order By: Relevance
“…For compatibility with minigraph, we use GetBlunted to derive a variation graph from the De Bruijn graph [17]. As an additional evaluation, we align the test reads to the original virus reference genomes using the TCG-Aligner [33] (the basis for MetaGraph-LA) to determine reference values for alignment accuracy during our experiments. See Supplementary Table A1 for statistics about these graphs.…”
Section: Evaluation Methodologymentioning
confidence: 99%
See 3 more Smart Citations
“…For compatibility with minigraph, we use GetBlunted to derive a variation graph from the De Bruijn graph [17]. As an additional evaluation, we align the test reads to the original virus reference genomes using the TCG-Aligner [33] (the basis for MetaGraph-LA) to determine reference values for alignment accuracy during our experiments. See Supplementary Table A1 for statistics about these graphs.…”
Section: Evaluation Methodologymentioning
confidence: 99%
“…A chain is a series of anchors that appear in the correct order with respect to the query such that each anchor can reach the subsequent anchor in the chain via graph traversal. A chain is scored more favorably if it contains more anchors and is penalized if the distances between the anchors in the query differ from their corresponding graph traversal distances [39,33,2].…”
Section: Sequence-to-graph Alignmentmentioning
confidence: 99%
See 2 more Smart Citations
“…Since the time complexity of optimal sequence-to-graph alignment grows linearly with the number of edges in the graph [20,16], many approaches instead follow an approximate seed-and-extend strategy [2], which operates in four main steps: i) seed extraction , which in its simplest form involves finding all substrings with a certain length, ii) seed anchoring , finding matching nodes in the graph, iii) seed filtration , often involving clustering [9,37] or co-linear chaining [25,1,32,8] of seeds, and iv) seed extension , involving performing semi-global pairwise sequence alignment forwards and backwards from each anchored seed [28]. We will review the usage of exact seeds utilized in tools such as vg[15] and G raph A ligner [37] and discuss their limitations in a high mutation-rate setting.…”
Section: Introductionmentioning
confidence: 99%