2012
DOI: 10.1007/978-3-642-28332-1_21
|View full text |Cite
|
Sign up to set email alerts
|

A Faster Grammar-Based Self-index

Abstract: To store and search genomic databases efficiently, researchers have recently started building compressed self-indexes based on grammars. In this paper we show how, given a straight-line program with r rules for a string S[1..n] whose LZ77 parse consists of z phrases, we can store a self-index for S in O(r + z log log n) space such that, given a pattern P [1..m], we can list the occ occurrences of P in S in O m 2 + occ log log n time. If the straight-line program is balanced and we accept a small probability of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
64
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
6
3
1

Relationship

2
8

Authors

Journals

citations
Cited by 81 publications
(64 citation statements)
references
References 42 publications
0
64
0
Order By: Relevance
“…Many proposals since then aimed at reducing the locating time by building on other compression methods that perform well on repetitive texts: indexes based on the Lempel-Ziv parse [76] of T , with size bounded in terms of the number z of phrases [73,42,97,9,88,15,23]; indexes based on the smallest context-free grammar (or an approximation thereof) that generates T and only T [68,21], with size bounded in terms of the size g of the grammar [25,26,41,89]; and indexes based on the size e of the smallest automaton (CDAWG) [18] recognizing the substrings of T [9,111,7]. Table 1 summarizes the pareto-optimal achievements.…”
Section: Related Workmentioning
confidence: 99%
“…Many proposals since then aimed at reducing the locating time by building on other compression methods that perform well on repetitive texts: indexes based on the Lempel-Ziv parse [76] of T , with size bounded in terms of the number z of phrases [73,42,97,9,88,15,23]; indexes based on the smallest context-free grammar (or an approximation thereof) that generates T and only T [68,21], with size bounded in terms of the size g of the grammar [25,26,41,89]; and indexes based on the size e of the smallest automaton (CDAWG) [18] recognizing the substrings of T [9,111,7]. Table 1 summarizes the pareto-optimal achievements.…”
Section: Related Workmentioning
confidence: 99%
“…There have been some indexes aimed at performing pattern matching on repetitive collections based on those techniques [17,16,8,10,13]. However, they do not provide the versatile suffix tree functionality, and they do not seem to yield a way to obtain it.…”
Section: Introductionmentioning
confidence: 99%
“…For example, compressed pattern matching [33], grammar-based self-index [34,35], random accessible data structure [36] and so on. One property of our grammar is that the height of the parse tree is bounded by O(log n); another property is that our algorithm can find long common substrings without Ω(n) space data structures.…”
Section: Discussionmentioning
confidence: 99%