2021
DOI: 10.1093/bioinformatics/btab330
|View full text |Cite
|
Sign up to set email alerts
|

Topology-based sparsification of graph annotations

Abstract: Motivation Since the amount of published biological sequencing data is growing exponentially, efficient methods for storing and indexing this data are more needed than ever to truly benefit from this invaluable resource for biomedical research. Labeled de Bruijn graphs are a frequently-used approach for representing large sets of sequencing data. While significant progress has been made to succinctly represent the graph itself, efficient methods for storing labels on such graphs are still rap… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
16
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1
1

Relationship

4
1

Authors

Journals

citations
Cited by 6 publications
(16 citation statements)
references
References 24 publications
0
16
0
Order By: Relevance
“…Depending on the number of k-mers and files, this matrix can have up to ∼ 10 12 rows (corresponding to distinct k-mers) and ∼ 10 7 columns (corresponding to different files or, in general, labels) [22]. However, it can be highly compressed thanks to its sparsity [32,3,23,2,14].…”
Section: Graph Annotationsmentioning
confidence: 99%
See 2 more Smart Citations
“…Depending on the number of k-mers and files, this matrix can have up to ∼ 10 12 rows (corresponding to distinct k-mers) and ∼ 10 7 columns (corresponding to different files or, in general, labels) [22]. However, it can be highly compressed thanks to its sparsity [32,3,23,2,14].…”
Section: Graph Annotationsmentioning
confidence: 99%
“…Leveraging similarity of annotations of neighboring nodes For the case of binary annotations, transformations assuming likely similarity between annotations of adjacent nodes in the graph and replacing them with relative differences have been explored in Mantis-MST [2] and RowDiff [14]. The RowDiff algorithm conceptually consists of two parts.…”
Section: Diff-compression Of Extended Graph Annotationsmentioning
confidence: 99%
See 1 more Smart Citation
“…Approaches for representing relations between k-mers and input files have been extensively explored in the past decade [20,32,3,23,2,14]. Motivated by the experiment discovery problem, which is to find a sequencing library within a large collection based on a query pattern, these methods encode binary metadata attributes (e.g., the membership of a k-mer to a certain sequence or file) in a sparse binary matrix.…”
Section: Graph Annotationsmentioning
confidence: 99%
“…Depending on the number of k-mers and files, this matrix can have up to ∼ 10 12 rows (corresponding to distinct k-mers) and ∼ 10 7 columns (corresponding to different files or, in general, labels) [22]. However, it can be highly compressed thanks to its sparsity [32,3,23,2,14].…”
Section: Graph Annotationsmentioning
confidence: 99%