Re-Pair in Small Space

Köppl, Dominik; Tomohiro, I; Furuya, Isamu; Takabatake, Yoshimasa; Sakai, Kensuke; Goto, Keisuke

doi:10.3390/a14010005

Cited by 2 publications

(2 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Its main drawback is that most implementations need Θ(n) space [11], and hence are not applicable on massive datasets. The only implementation using o(n) space is [22], but it is not practical. There is also work on running Re-Pair on the compressed input [30], but since it already requires the text as a grammar, it is not applicable in our case.…”

Section: Resultsmentioning

confidence: 99%

Fast and Space-Efficient Construction of AVL Grammars from the LZ77 Parsing

Kempa,

Langmead

2021

Preprint

View full text Add to dashboard Cite

Grammar compression is, next to Lempel-Ziv (LZ77) and run-length Burrows-Wheeler transform (RLBWT), one of the most flexible approaches to representing and processing highly compressible strings. The main idea is to represent a text as a context-free grammar whose language is precisely the input string. This is called a straight-line grammar (SLG). An AVL grammar, proposed by Rytter [Theor. Comput. Sci., 2003] is a type of SLG that additionally satisfies the AVL-property: the heights of parse-trees for children of every nonterminal differ by at most one. In contrast to other SLG constructions, AVL grammars can be constructed from the LZ77 parsing in compressed time: O(z log n) where z is the size of the LZ77 parsing and n is the length of the input text. Despite these advantages, AVL grammars are thought to be too large to be practical.We present a new technique for rapidly constructing a small AVL grammar from an LZ77 or LZ77-like parse. Our algorithm produces grammars that are always at least five times smaller than those produced by the original algorithm, and never more than double the size of grammars produced by the practical Re-Pair compressor [Larsson and Moffat, Proc. IEEE, 2000]. Our algorithm also achieves low peak RAM usage. By combining this algorithm with recent advances in approximating the LZ77 parsing, we show that our method has the potential to construct a run-length BWT from an LZ77 parse in about one third of the time and peak RAM required by other approaches. Overall, we show that AVL grammars are surprisingly practical, opening the door to much faster construction of key compressed data structures.

show abstract

Section: Resultsmentioning

confidence: 99%

Fast and Space-Efficient Construction of AVL Grammars from the LZ77 Parsing

Kempa,

Langmead

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…On the other hand, RePair is known to achieve the best compression ratio on many real-world datasets and enjoy applications in web graph compression [10] and XML compression [31]. Some variants of RePair have also been proposed [32,6,17,15,14,28].…”

Section: Related Workmentioning

confidence: 99%

RePair Grammars are the Smallest Grammars for Fibonacci Words

Mieno¹,

Inenaga²,

Horiyama³

2022

Preprint

View full text Add to dashboard Cite

Grammar-based compression is a loss-less data compression scheme that represents a given string w by a context-free grammar that generates only w. While computing the smallest grammar which generates a given string w is NP-hard in general, a number of polynomial-time grammar-based compressors which work well in practice have been proposed. RePair, proposed by Larsson and Moffat in 1999, is a grammar-based compressor which recursively replaces all possible occurrences of a most frequently occurring bigrams in the string. Since there can be multiple choices of the most frequent bigrams to replace, different implementations of RePair can result in different grammars. In this paper, we show that the smallest grammars generating the Fibonacci words F k can be completely characterized by RePair, where F k denotes the k-th Fibonacci word. Namely, all grammars for F k generated by any implementation of RePair are the smallest grammars for F k , and no other grammars can be the smallest for F k . To the best of our knowledge, Fibonacci words are the first non-trivial infinite family of strings for which RePair is optimal.

show abstract

Re-Pair in Small Space

Cited by 2 publications

References 40 publications

Fast and Space-Efficient Construction of AVL Grammars from the LZ77 Parsing

Fast and Space-Efficient Construction of AVL Grammars from the LZ77 Parsing

RePair Grammars are the Smallest Grammars for Fibonacci Words

Contact Info

Product

Resources

About