2017 Data Compression Conference (DCC) 2017
DOI: 10.1109/dcc.2017.24
|View full text |Cite
|
Sign up to set email alerts
|

Space-Efficient Re-Pair Compression

Abstract: Re-Pair [1] is an effective grammar-based compression scheme achieving strong compression rates in practice. Let n, σ, and d be the text length, alphabet size, and dictionary size of the final grammar, respectively. In their original paper, the authors show how to compute the Re-Pair grammar in expected linear time and 5n + 4σ 2 + 4d + √ n words of working space on top of the text. In this work, we propose two algorithms improving on the space of their original solution. Our model assumes a memory word of log … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
31
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(31 citation statements)
references
References 11 publications
0
31
0
Order By: Relevance
“…In practice, the MR-order varies depending on the implementation of the priority queue that manages pairs. For this reason, we used four different implementations of RePair in the comparative analysis, and they were implemented by Maruyama (https://code.google.com/archive/p/re-pair/), Navarro (https: //www.dcc.uchile.cl/~gnavarro/software/index.html), Prezza (https://github.com/nicolaprezza/ Re-Pair) [7], and Wan (https://github.com/rwanwork/Re-Pair); we ran it with level 0 (no heuristic option), respectively. Table 1 lists the details of the texts that we used in the experiments.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In practice, the MR-order varies depending on the implementation of the priority queue that manages pairs. For this reason, we used four different implementations of RePair in the comparative analysis, and they were implemented by Maruyama (https://code.google.com/archive/p/re-pair/), Navarro (https: //www.dcc.uchile.cl/~gnavarro/software/index.html), Prezza (https://github.com/nicolaprezza/ Re-Pair) [7], and Wan (https://github.com/rwanwork/Re-Pair); we ran it with level 0 (no heuristic option), respectively. Table 1 lists the details of the texts that we used in the experiments.…”
Section: Methodsmentioning
confidence: 99%
“…Despite its simple scheme, RePair is known for its high compression in practice [3][4][5], and hence, it has been comprehensively studied. Some examples of studies on the RePair algorithm include its extension to an online algorithm [6], practical working time/space improvements [7,8], applications to various fields [3,9,10], and theoretical analysis of generated grammar sizes [1,11,12].…”
Section: Introductionmentioning
confidence: 99%
“…In practice, the MR-order varies how we implement the priority queue managing symbol pairs. To see this, we used five RePair implementations in the comparison; they were implemented by Maruyama 3 , Navarro 4 , Prezza 5 [5], Wan 6 , and Yoshida 7 . Table 1 summarizes the details of the texts we used in the comparison.…”
Section: Methodsmentioning
confidence: 99%
“…In our experiments, we combine our implementation described above with a well-tuned implementation of lineartime RePair by Maruyama [1] (denote it by RP). Setting t ∈ {2, 3, 4, 5}, we compare our method with RP and the most space efficient linear-time algorithm [6,2] to date (denote it by SERP). In theory, SERP runs in O(N/ ) time using at most (1.5 + )N words of space for arbitrary small ≤ 1, but is fixed to 1 in their implementation.…”
Section: Methodsmentioning
confidence: 99%