2022
DOI: 10.1101/2022.11.04.514718
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Label-guided seed-chain-extend alignment on annotated De Bruijn graphs

Abstract: The amount of data stored in genomic sequence databases is growing exponentially, far exceeding traditional indexing strategies' processing capabilities. Many recent indexing methods organize sequence data into a sequence graph to succinctly represent large genomic data sets from reference genome and sequencing read set databases. These methods typically use De Bruijn graphs as the graph model or the underlying index model, with auxiliary graph annotation data structures to associate graph nodes with various m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 101 publications
0
4
0
Order By: Relevance
“…Then, the respective annotations are retrieved and the aggregated result is returned as output ( Extended Data Figure 1 c ). For increased sensitivity, we developed algorithms for sequence-to-graph alignment 40,46 , which identify the closest matching path in the whole graph ( Extended Data Figures 1 d; Methods ). We also designed a batch query algorithm (schematic in Extended Data Figure 1 e, Methods ), exploiting the presence of k -mers shared between individual queries by forming a fast intermediate query subgraph , that increases throughput up to 100-fold for large repetitive queries (e.g., sets of sequencing reads).…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Then, the respective annotations are retrieved and the aggregated result is returned as output ( Extended Data Figure 1 c ). For increased sensitivity, we developed algorithms for sequence-to-graph alignment 40,46 , which identify the closest matching path in the whole graph ( Extended Data Figures 1 d; Methods ). We also designed a batch query algorithm (schematic in Extended Data Figure 1 e, Methods ), exploiting the presence of k -mers shared between individual queries by forming a fast intermediate query subgraph , that increases throughput up to 100-fold for large repetitive queries (e.g., sets of sequencing reads).…”
Section: Resultsmentioning
confidence: 99%
“…When label recombination is not desired, we support an alternative approach where queries are aligned to subgraphs of the joint graph induced by single annotation labels (columns of the annotation matrix). We call this approach label-consistent graph alignment (or alignment to columns ) and is implemented by the MetaGraph-LA algorithm 46 . However, instead of aligning to all the subgraphs independently, we perform the alignment with a single search procedure while keeping track of the annotations corresponding to the alignments.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Such exhaustive approaches, sans additional theoretical insights, are unlikely to achieve a significant breakthrough due to inherent computational burden of storing or operating over the representations. Many other tasks addressing similar issues include genome alignment [16, 18, 21] and error correction [1].…”
Section: Introductionmentioning
confidence: 99%