Self-indexing Based on LZ77

Kreft, Sebastian; Navarro, Gonzalo

doi:10.1007/978-3-642-21458-5_6

Cited by 60 publications

(60 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Rather, we need repetition aware compression methods. Although this kind of compression is well-known (e.g., grammar-based and Ziv-Lempel-based compression), only recently there have appeared CSAs and other indexes that take advantage of repetitiveness [24][25][26][27]. Yet, those indexes do not support the full suffix tree functionality.…”

Section: Introductionmentioning

confidence: 99%

Practical Compressed Suffix Trees

2013

Self Cite

View full text Add to dashboard Cite

Abstract:The suffix tree is an extremely important data structure in bioinformatics. Classical implementations require much space, which renders them useless to handle large sequence collections. Recent research has obtained various compressed representations for suffix trees, with widely different space-time tradeoffs. In this paper we show how the use of range min-max trees yields novel representations achieving practical space/time tradeoffs. In addition, we show how those trees can be modified to index highly repetitive collections, obtaining the first compressed suffix tree representation that effectively adapts to that scenario.

show abstract

Section: Introductionmentioning

confidence: 99%

Practical Compressed Suffix Trees

2013

Self Cite

View full text Add to dashboard Cite

show abstract

“…We also use the compressed representation of P LCP [8]. Since in our case r n, we use a compressed bitmap representation that is useful for very sparse bitmaps [13]: We δ-encode the runs of 0s between consecutive 1s, and store absolute pointers to the representation of every sth 1. This is very efficient in space and solves select 1 queries in time O(s), which is the operation needed to compute a P LCP value.…”

Section: Our Repetition-aware Cstmentioning

confidence: 99%

“…We used various DNA collections from the Repetitive Corpus at PizzaChili (http://pizzachili.dcc.uchile.cl/repcorpus, created and thoroughly studied by Kreft [12]). We took DNA collections Para and Influenza, which are the most repetitive ones, and Escherichia, a less repetitive one.…”

Section: Experimental Evaluationmentioning

confidence: 99%

“…Rather, we need repetition aware compression methods. Although this kind of compression is well-known (e.g., grammar-based and Ziv-Lempel-based compression), only recently there have appeared compressed suffix arrays and other indexes capable of pattern searching that take advantage of repetitiveness [17,5,4,13]. Yet, none of the existing compressed suffix trees [26,8,7,23,25,9], is tailored to repetitive text collections.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Compressed Suffix Trees for Repetitive Texts

Abeliuk

Navarro

2012

String Processing and Information Retrieval

Self Cite

View full text Add to dashboard Cite

Abstract. We design a new compressed suffix tree specifically tailored to highly repetitive text collections. This is particularly useful for sequence analysis on large collections of genomes of the close species. We build on an existing compressed suffix tree that applies statistical compression, and modify it so that it works on the grammar-compressed version of the longest common prefix array, whose differential version inherits much of the repetitiveness of the text.

show abstract

“…Repetitiveness is not captured by statistical compression methods nor frequency-based entropy definitions [16,24] (i.e., the frequencies of symbols do not change much if we add near-copies of an initial sequence). Rather, we need repetition aware compression methods.…”

Section: Introductionmentioning

confidence: 99%