Shogo Yoshida scite author profile

We address the problem of improving variable-length-to-xed-length codes (VF codes). A VF code is an encoding scheme that uses a xed-length code, and thus, one can easily access the compressed data. However, conventional VF codes usually have an inferior compression ratio to that of variable-length codes. Although a method proposed by T. Uemura et al. in 2010 achieves a good compression ratio comparable to that of gzip, it is very time consuming. In this study, we propose a new VF coding method that applies a xed-length code to the set of rules extracted by the Re-Pair algorithm, proposed by N. J. Larsson and A. Moat in 1999. The Re-Pair algorithm is a simple o-line grammarbased compression method that has good compression-ratio performance with moderate compression speed. Moreover, we present several experimental results to show that the proposed coding is superior to the existing VF coding. IntroductionOur objective is to develop an eective variable-length-to-xed-length code (VF code).A VF code is a coding scheme that parses an input text into a consecutive sequence of substrings, and then, it assigns a xed length codeword to each parsed substring. Combining such algorithms with VF coding is a promising idea.In this study, we propose a method to apply xed-length coding to the rules ex- Re-Pair algorithm with xed-length codewords, whereas the original algorithm utilizes variable-length codewords to achieve an extremely good compression ratio. To minimize the decrease in the compression ratio compared to the original algorithm, we exploit a simple characteristic of the algorithm; the minimum output size frequently occurs in the process of repeated bigram replacement. Because all the codewords have equal length in our method, we can easily estimate the nal output size for each intermediate rule set of the Re-Pair algorithm. Therefore, by preserving the best point and rewinding the rule set back to this point, we can obtain the minimum output with a reasonable cost.The performance of the proposed method is explained by evaluation experiments for some corpus. The experimental results show that the compression ratio of the proposed method is approximately equal to that of bzip even though it uses xedlength codewords. The compression speed is approximately the same as that of the original Re-Pair algorithm. Pattern-matching performance is also demonstrated on compressed texts, and it is conrmed that the compressed pattern matching with our VF code is faster than UNIX zgrep, which is a typical decompress-then-search method, i.e., gunzip-then-grep.Our contributions can be summarized as follows:• We developed a new VF coding that has superior compression ratio and compression time compared with those of the existing VF coding. The proposed 2 method is based on a general concept. However, it was not so obvious whether the method was really eective.• We demonstrated experimentally that pattern matching can be performed faster on a text compressed by our method than that on the text compressed by the decompress-then...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shogo Yoshida

High-resolution observations of dissolved isoprene in surface seawater in the Southern Ocean during austral summer 2010–2011

Electrostatic properties of C–S–H and C-A-S-H for predicting calcium and chloride adsorption

Adaptive Dictionary Sharing Method for Re-Pair Algorithm

An Efficient Algorithm for Almost Instantaneous VF Code Using Multiplexed Parse Tree

Effective Variable-Length-to-Fixed-Length Coding via a Re-Pair Algorithm

Contact Info

Product

Resources

About