Solving the String Statistics Problem in Time O(n log n)

Brodal, Gerth Stølting; Lyngsø, Rune B.; Östlin, Anna; Pedersen, Christian N. S.

doi:10.7146/brics.v9i13.21731

Cited by 11 publications

(9 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Squares play a role in an augmentation of the suffix tree suitable to report, for any query pattern, the number of its non-overlapping occurrences. 6,10 There are multiple uses of suffix trees in setting up some kind of signature for text strings, as well as measures of similarity or difference.…”

Section: Fallout Extensions and Challengesmentioning

confidence: 99%

40 years of suffix trees

Apostolico¹,

Crochemore²,

Farach-Colton

et al. 2016

Commun. ACM

View full text Add to dashboard Cite

show abstract

Section: Fallout Extensions and Challengesmentioning

confidence: 99%

40 years of suffix trees

Apostolico¹,

Crochemore²,

Farach-Colton

et al. 2016

Commun. ACM

View full text Add to dashboard Cite

show abstract

“…To pick the one with maximum SavedCost, we need the count of non-overlapping occurrences of these substrings. A Minimal Augmented Suffix Tree [5] over IT ∪M can be constructed and used to count the number of non-overlapping occurrences of all right-maximal repeats in overall O(L log L) time, where L is the total length of target strings. Using a regular suffix tree instead, this can be achieved in only O(L) time; but suffix tree may count overlapping occurrences.…”

Section: The Greedy Lexis Algorithmmentioning

confidence: 99%

Lexis

Siyari

Dilkina

Dovrolis

2016

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Data represented as strings abounds in biology, linguistics, document mining, web search and many other fields. Such data often have a hierarchical structure, either because they were artificially designed and composed in a hierarchical manner or because there is an underlying evolutionary process that creates repeatedly more complex strings from simpler substrings. We propose a framework, referred to as Lexis, that produces an optimized hierarchical representation of a given set of "target" strings. The resulting hierarchy, "Lexis-DAG", shows how to construct each target through the concatenation of intermediate substrings, minimizing the total number of such concatenations or DAG edges. The Lexis optimization problem is related to the smallest grammar problem. After we prove its NP-hardness for two cost formulations, we propose an efficient greedy algorithm for the construction of Lexis-DAGs. We also consider the problem of identifying the set of intermediate nodes (substrings) that collectively form the "core" of a Lexis-DAG, which is important in the analysis of Lexis-DAGs. We show that the Lexis framework can be applied in diverse applications such as optimized synthesis of DNA fragments in genomic libraries, hierarchical structure discovery in protein sequences, dictionary-based text compression, and feature extraction from a set of documents.1 Lexis means "word" in Greek.

show abstract

“…They are labeled with the starting positions of the suffixes of w. We introduce the Cover Suffix Tree of w, denoted by CST (w), as an augmented -new nodes are added -suffix tree in which the nodes are annotated with information relevant to covers. CST (w) is similar to the data structure named Minimal Augmented Suffix Tree (see [3,5]).…”

Section: Augmented and Annotated Suffix Treesmentioning

confidence: 99%

Fast Algorithm for Partial Covers in Words

et al. 2014

View full text Add to dashboard Cite

A factor u of a word w is a cover of w if every position in w lies within some occurrence of u in w. A word w covered by u thus generalizes the idea of a repetition, that is, a word composed of exact concatenations of u. In this article we introduce a new notion of α-partial cover, which can be viewed as a relaxed variant of cover, that is, a factor covering at least α positions in w. We develop a data structure of O(n) size (where n = |w|) that can be constructed in O(n log n) time which we apply to compute all shortest α-partial covers for a given α. We also employ it for an O(n log n)-time algorithm computing a shortest α-partial cover for each α = 1, 2, . . . , n.Keywords Cover of a word · Quasiperiodicity · Suffix tree

show abstract

Solving the String Statistics Problem in Time O(n log n)

Cited by 11 publications

References 16 publications

40 years of suffix trees

40 years of suffix trees

Lexis

Fast Algorithm for Partial Covers in Words

Contact Info

Product

Resources

About