Kensuke Sakai scite author profile

Given a string T of length N , the goal of grammar compression is to construct a small context-free grammar generating only T. Among existing grammar compression methods, RePair (recursive paring) [Larsson and Moffat, 1999] is notable for achieving good compression ratios in practice. Although the original paper already achieved a time-optimal algorithm to compute the RePair grammar RePair(T) in expected O(N ) time, the study to reduce its working space is still active so that it is applicable to large-scale data. In this paper, we propose the first Re-Pair algorithm working in compressed space, i.e., potentially o(N ) space for highly compressible texts. The key idea is to give a new way to restructure an arbitrary grammar S for T into RePair(T) in compressed space and time. Based on the recompression technique, we propose an algorithm for RePair(T) in O(min(N, nm log N )) space and expected O(min(N, nm log N )m) time or O(min(N, nm log N ) log log N ) time, where n is the size of S and m is the number of variables in RePair(T). We implemented our algorithm running in O(min(N, nm log N )m) time and show it can actually run in compressed space. We also present a new approach to reduce the peak memory usage of existing RePair algorithms combining with our algorithms, and show that the new approach outperforms, both in computation time and space, the most space efficient linear-time RePair implementation to date. ACM Subject Classification Data structures design and analysis → Data compressionDigital Object Identifier 10.4230/LIPIcs.CVIT.2016.2 to be precise, the improvement is achieved only when m = ω(log log N ), which is likely to hold for compressible texts 3 log N ≤ m is not necessarily true since RePair stops producing variables when the input text is compressed into a string w containing no bigram with frequency ≥ 2. Still, it holds that log N ≤ m + |w|.

show abstract

Re-Pair in Small Space

Köppl

Tomohiro

Furuya

et al. 2020

Algorithms

View full text Add to dashboard Cite

Re-Pairis a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large-scale data sets. As a solution for this problem, we present, given a text of length n whose characters are drawn from an integer alphabet with size σ=nO(1), an O(min(n2,n2lglogτnlglglgn/logτn)) time algorithm computing Re-Pair with max((n/c)lgn,nlgτ)+O(lgn) bits of working space including the text space, where c≥1 is a fixed user-defined constant and τ is the sum of σ and the number of non-terminals. We give variants of our solution working in parallel or in the external memory model. Unfortunately, the algorithm seems not practical since a preliminary version already needs roughly one hour for computing Re-Pair on one megabyte of text.

show abstract

RePair in Compressed Space and Time

Sakai¹,

Ohno²,

Goto³

et al. 2018

Preprint

View full text Add to dashboard Cite

Re-Pair in Small Space

Köppl

Tomohiro

Furuya

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kensuke Sakai

A faster implementation of online RLBWT and its application to LZ77 parsing

RePair in Compressed Space and Time

Re-Pair in Small Space

RePair in Compressed Space and Time

Re-Pair in Small Space

Contact Info

Product

Resources

About