2014
DOI: 10.1101/001669
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Algorithms in Stringomics (I): Pattern-Matching against “Stringomes”

Abstract: This paper reports an initial design of new data-structures that generalizes the idea of pattern-matching in stringology, from its traditional usage in an (unstructured) set of strings to the arena of a well-structured family of strings. In particular, the object of interest is a family of strings composed of blocks/classes of highly similar “stringlets,” and thus mimic a population of genomes made by concatenating haplotype-blocks, further constrained by haplotype-phasing. Such a family of strings, which we d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 31 publications
0
9
0
Order By: Relevance
“…Prior to the predominant prefix-sorting approach that we are going to discuss in detail in the next subsections, the problem of solving indexed path queries on labeled graphs has been tackled in the literature by resorting to geometric data structures [61,62]. These solutions work in the hypertext model: the objects being indexed are node-labeled graphs G = (V, E, Σ, λ), where function λ : V → Σ * associates a string to each node (note the difference with our edge-labeled model, where each edge is labeled with a single character).…”
Section: Hypertext Indexingmentioning
confidence: 99%
See 1 more Smart Citation
“…Prior to the predominant prefix-sorting approach that we are going to discuss in detail in the next subsections, the problem of solving indexed path queries on labeled graphs has been tackled in the literature by resorting to geometric data structures [61,62]. These solutions work in the hypertext model: the objects being indexed are node-labeled graphs G = (V, E, Σ, λ), where function λ : V → Σ * associates a string to each node (note the difference with our edge-labeled model, where each edge is labeled with a single character).…”
Section: Hypertext Indexingmentioning
confidence: 99%
“…This labeled graph model is well suited for applications where the strings labeling each node are very long (for example, a transcriptome), in which case the label component (rather than the graph's topology) dominates the data structure's space. Both solutions discussed in Reference [61,62] resort to geometric data structures. First, a classic text index (for example, a compressed suffix array) is built over the concatenation λ(u 1 ) • # • • • # • λ(u n ) of the strings labeling all the graph's nodes u 1 , .…”
Section: Hypertext Indexingmentioning
confidence: 99%
“…Prior to the predominant prefix-sorting approach that we are going to discuss in detail in the next subsections, the problem of solving indexed path queries on labeled graphs has been tackled in the literature by resorting to geometric data structures [61,62]. These solutions work in the hypertext model, where the graph's nodes are long strings and directed edges between those strings indicate how they are connected according to an arbitrarily complicated topology.…”
Section: Hypertext Indexingmentioning
confidence: 99%
“…Pattern occurrences entirely contained in a single node are instead matched using a standard compressed index like the ones discussed in Section 2.1. The main issue with these solutions is that they cannot efficiently locate pattern occurrences spanning two or more edges; the solutions proposed in [61,62], based on seed-and-extend, require to visit the whole graph in the worst case (even though on realistic datasets they do work well). In practice, the problem is mitigated by the fact that the strings stored in each node are assumed to be very long.…”
Section: Hypertext Indexingmentioning
confidence: 99%
“…A graph is a diagram that allows any kind of genetic variant, large or small, to be represented as a path through the space of possible ways of gluing sequences together to form a genome. Past studies have considered the question of how to construct and represent genome graphs in a way that is both memory efficient and fast to query (18,41). Recent software tools make it easy to construct such graphs from genome sequences (43).…”
Section: Sequence Differences From the Referencementioning
confidence: 99%