2017
DOI: 10.1007/978-3-319-53733-7_9
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Pattern Matching in Elastic-Degenerate Texts

Abstract: In this paper, we extend the notion of gapped strings to elastic-degenerate strings. An elastic-degenerate string can been seen as an ordered collection of k > 1 seeds (substrings/subpatterns) interleaved by elastic-degenerate symbols such that each elastic-degenerate symbol corresponds to a set of two or more variable length strings. Here, we present an algorithm for solving the pattern matching problem with (solid) pattern and elastic-degenerate text, running in O(N +αγnm) time; where m is the length of the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 19 publications
(21 citation statements)
references
References 19 publications
0
21
0
Order By: Relevance
“…Since genomic sequences are endowed with polymorphisms and sequencing errors, the existence of an exact occurrence can result into a strong assumption. The aim of this work is to generalize the studies of [11] and [27] for the exact case, allowing some approximation in the occurrences of the input pattern. We suggest a simple on-line O(kmG + kN)-time and O(m)-space algorithm, G being the total number of strings in T and k > 0 the maximum number of allowed substitutions in a pattern's occurrence, that is nonzero Hamming distance.…”
Section: Log M + N);mentioning
confidence: 99%
See 2 more Smart Citations
“…Since genomic sequences are endowed with polymorphisms and sequencing errors, the existence of an exact occurrence can result into a strong assumption. The aim of this work is to generalize the studies of [11] and [27] for the exact case, allowing some approximation in the occurrences of the input pattern. We suggest a simple on-line O(kmG + kN)-time and O(m)-space algorithm, G being the total number of strings in T and k > 0 the maximum number of allowed substitutions in a pattern's occurrence, that is nonzero Hamming distance.…”
Section: Log M + N);mentioning
confidence: 99%
“…In the literature, many different (compressed) representations and thus algorithms have been considered for pattern matching on a set of similar texts [4][5][6][7][8][9][10]. A natural representation of pan-genomes, or fragments of them, that we consider here are elastic-degenerate texts [11]. An elastic-degenerate text is a sequence which compactly represents a multiple alignment of several closely-related sequences.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We present here a natural representation of pan-genomes (whole genomes or their fragments): elastic-degenerate texts. An elastic-degenerate text (ED-text) is a sequence compactly representing a multiple alignment of several closely-related sequences: substrings that match exactly are collapsed, while those in positions where the sequences differ (by means of substitutions, insertions, and deletions of substrings) are called degenerate, and therein all possible variants observed at that location are listed [105]. Actually, ED-texts correspond exactly to the Variant Call Format (.vcf), the standard for files storing genomic variations [106].…”
Section: Many Acute and Chronic Diseases Originate As Network Diseasementioning
confidence: 99%
“…In the literature, many different (compressed) representations and thus algorithms have been considered for pattern matching on a set of similar texts [4,5,16,21,6,3,12]. A natural representation of pan-genomes, or fragments of them, that we consider here are elastic-degenerate texts [11]. An elastic-degenerate text is a sequence which compactly represents a multiple alignment of several closely-related sequences.…”
Section: Introductionmentioning
confidence: 99%