Implementing the context tree weighting method for text compression

Sadakane, Kunihiko; Okazaki, T.; Imai, Hiroshi

doi:10.1109/dcc.2000.838152

Cited by 26 publications

(21 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It makes use of Move-To-Front (MTF) [1] and an entropy coder as the backend compressor. Unceasing efforts are on to improve the efficiency of PPM [8,9,17] and BWT [1,3,19].…”

Section: Introductionmentioning

confidence: 99%

RIDBE: A Lossless, Reversible Text Transformation Scheme for Better Compression

Senthil¹,

Rexiline²,

Robert³

2012

IJCA

View full text Add to dashboard Cite

In this paper, we propose RIDBE (Reinforced Intelligent Dictionary Based Encoding), a Dictionary-based reversible lossless text transformation algorithm. The basic philosophy of our secure compression is to preprocess the text and transform it into some intermediate form which can be compressed with better efficiency and which exploits the natural redundancy of the language in making the transformation. In RIDBE, the length of the input word is denoted by the ASCII characters 232 -253 and the offset of the words in the dictionary is denoted with the alphabets A-Z. The existing or backend algorithm's ability to compress is seen to improve considerably when this approach is applied to source text and it is used in conjunction with BWT. A sufficient level of security of the transmitted information is also maintained. RIDBE achieves better compression at the preprocessing stage and enough redundancy is retained for the compression algorithms to get better results. The experimental results of this compression method are analysed. RIDBE gives 19.08% improvement over Simple BWT, 9.40% improvement over BWT with *-encode, 3.20% improvement over BWT with IDBE, 1.85% over BWT with EIDBE and about 1% over IIDBE.

show abstract

“…It makes use of Move-To-Front (MTF) [1] and an entropy coder as the backend compressor. Unceasing efforts are on to improve the efficiency of PPM [8,9,17] and BWT [1,3,19].…”

Section: Introductionmentioning

confidence: 99%

RIDBE: A Lossless, Reversible Text Transformation Scheme for Better Compression

Senthil¹,

Rexiline²,

Robert³

2012

IJCA

View full text Add to dashboard Cite

show abstract

“…The compression results also show an improvement in the range of around 2% to 7% for compression methods with LIPT over the best results obtained by various modifications in BWT [1,2,5,18] and PPM [6,7,17,19]. This comes at the expense of some storage overhead whose amortized cost is shown to be negligible.…”

Section: Figure 1: Frequency Of Words Versus Length Of Words In Our Tmentioning

confidence: 55%

“…A number of efforts have been made to reduce the time for PPM and also to improve the compression ratio. Sadakane, Okazaki, and Imai [17] have given a method where they have combined PPM and CTW [19] to get better compression. Effros [7] has given a new implementation of PPM* with the complexity of BWT.…”

Section: Comparison With Recent Improvements Of Bwt and Ppmmentioning

confidence: 99%

See 1 more Smart Citation

LIPT: a lossless text transform to improve compression

Awan

Mukherjee

Proceedings International Conference on Information Technology: Coding and Computing

View full text Add to dashboard Cite

show abstract

“…Unfortunately, it does not fit the mixture model scheme of Section 1.2. The latter approach was introduced as an implementation technique [105,78] and received less attention. However, it resembles the mixture model scheme of Section 1.2.…”

Section: Context Tree Weightingmentioning

confidence: 99%

Statistical Data Compression

Mattern¹

SpringerReference

View full text Add to dashboard Cite

The ongoing evolution of hardware leads to a steady increase in the amount of data that is processed, transmitted and stored. Data compression is an essential tool to keep the amount of data manageable. Furthermore, techniques from data compression have many more applications beyond compression, for instance data clustering, classification and time series prediction.In terms of empirical performance statistical data compression algorithms rank among the best. A statistical data compressor processes an input text letter by letter and performs compression in two stages -modeling and coding. During modeling a model estimates a probability distribution on the next letter based on the past input. During coding an encoder translates this probability distribution and the next letter into a codeword. Decoding reverts this process. Note that the model is exchangeable and its actual choice determines a statistical data compression algorithm. All major models use a mixer to combine multiple simple probability estimators, so-called elementary models.In statistical data compression there is an increasing gap between theory and practice. On the one hand, the "theoretician's approach" puts emphasis on models that allow for a mathematical code length analysis to evaluate their performance, but neglects running time and space considerations and empirical improvements. On the other hand the "practitioner's approach" focuses on the very reverse. The family of PAQ statistical compressors demonstrated the superiority of the "practitioner's approach" in terms of empirical compression rates.With this thesis we attempt to bridge the aforementioned gap between theory and practice with special focus on PAQ. To achieve this we apply the theoretician's tools to practitioner's approaches: We provide a code length analysis for several common and practical modeling and mixing techniques. The analysis covers modeling by relative frequencies with frequency discount and modeling by exponential smoothing of probabilities. For mixing we consider linear and geometrically weighted averaging of probabilities with Online Gradient Descent for weight estimation. Our results show that the models and mixers we consider perform nearly as well as idealized competitors that may adapt to the input. Experiments support our analysis. Moreover, our results add a theoretical justification to modeling and mixing from PAQ and generalize methods from PAQ. ii

show abstract

Implementing the context tree weighting method for text compression

Cited by 26 publications

References 12 publications

RIDBE: A Lossless, Reversible Text Transformation Scheme for Better Compression

RIDBE: A Lossless, Reversible Text Transformation Scheme for Better Compression

LIPT: a lossless text transform to improve compression

Statistical Data Compression

Contact Info

Product

Resources

About