Asymptotic Optimal Lossless Compression via the CSE Technique

Yokoo, Hidetoshi

doi:10.1109/ccp.2011.32

Cited by 15 publications

(9 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(1,1) and q (k,l) (1,2) are denoted by π c (q) and σ c (q), respectively, where both π r (q) and σ r (q) are λ [k,0] when l = 0 and 1. Figure 2 shows π c ( p), σ c ( p), π r ( p), and σ r ( p) from left to right for p in Figure 1.…”

Section: Subblock Concatenation and Dictionarymentioning

confidence: 99%

“…Dubé and Beaudoin proposed an efficient off-line lossless data compression algorithm for a binary source known as Compression via Substring Enumeration (CSE) [1]. In [2], Yokoo proposed a universal CSE algorithm for an ergodic source with a binary alphabet, and various versions of CSE for a binary source have been proposed so far [3]- [5]. Reportedly, the performance of compression ratios of CSE [4] is better than that of an efficient off-line data compression algorithm using the Burrows-Wheeler transformation (BWT) [6].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Universal Two-Dimensional Source Coding by Means of Subblock Enumeration

Ota¹,

Morita

Manada

2019

IEICE Trans. Fundamentals

View full text Add to dashboard Cite

The technique of lossless compression via substring enumeration (CSE) is a kind of enumerative code and uses a probabilistic model built from the circular string of an input source for encoding a one-dimensional (1D) source. CSE is applicable to two-dimensional (2D) sources, such as images, by dealing with a line of pixels of a 2D source as a symbol of an extended alphabet. At the initial step of CSE encoding process, we need to output the number of occurrences of all symbols of the extended alphabet, so that the time complexity increases exponentially when the size of source becomes large. To reduce computational time, we can rearrange pixels of a 2D source into a 1D source string along a spacefilling curve like a Hilbert curve. However, information on adjacent cells in a 2D source may be lost in the conversion. To reduce the time complexity and compress a 2D source without converting to a 1D source, we propose a new CSE which can encode a 2D source in a block-by-block fashion instead of in a line-by-line fashion. The proposed algorithm uses the flat torus of an input 2D source as a probabilistic model instead of the circular string of the source. Moreover, we prove the asymptotic optimality of the proposed algorithm for 2D general sources. key words: compression via substring enumeration, enumerative code, universal source coding, two-dimensional, general source Basic Notations and Definitions Alphabet and BlockLet X be a finite source alphabet {0, 1, . . . , J − 1} and let |X| be the cardinality of X, that is, |X| = J. Let X [m,n]

show abstract

Section: Subblock Concatenation and Dictionarymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Universal Two-Dimensional Source Coding by Means of Subblock Enumeration

Ota¹,

Morita

Manada

2019

IEICE Trans. Fundamentals

View full text Add to dashboard Cite

show abstract

“…In particular, (m − 1) × n subblocks p (m−1,n) (1,1) and p (m,n) (2,1) are denoted by π r (p) and σ r (p), respectively. Moreover, m×(n− 1) subblocks p (m,n−1) (1,1) and p (m,n) (1,2) are denoted by π c (p) and σ c (p), respectively. For example, for p in Fig.…”

Section: B Subblock Concatenation and Dictionarymentioning

confidence: 99%

“…where min(·) is the left-hand term of (10). For encoding N (b i ) by an entropy coding, a probability is assigned to N (b i ) as follows [2].…”

Section: Review Of Conventional Csementioning

confidence: 99%

Two-dimensional source coding by means of subblock enumeration

Ota

Morita

2017

2017 IEEE International Symposium on Information Theory (ISIT)

View full text Add to dashboard Cite

Abstract-. It is reported that performance of the CSE [4] is as well as that of an efficient off-line data compression algorithm using the Burrows-Wheeler transformation (BWT) [6]. In [7], it is proved that an encoder, which is a deterministic finite automaton, of the CSE and an encoder without sinks of the antidictionary coding [8] are isomorphic for a binary source. Moreover, an antidictionary coding proposed in [9] provided the first CSE for q-ary (q > 2) alphabet sources as a byproduct. Iwata and Arimura proposed the modified algorithm and evaluated the maximum redundancy rate of the CSE for the kth order Markov sources [10].For encoding an input source, the CSE utilizes a probabilistic model built from the circular string which is obtained by concatenating the first symbol to the last symbol of the source. A probabilistic model of the circular string is also useful for the BWT and antidictionary coding [7], [9], and in [11], it is shown that an antidictionary built from the circular string is useful for genome comparison such as deoxyribonucleic acid (DNA). However, for a 2D source such as an image, computational time of the CSE is exponential with respect to line length since the CSE works in line-by-line. The CSE deals

show abstract

“…Dubé and Yokoo [2] proved that CSE has a linear complexity both in time and in space worst-case performance for the length of string to be encoded. Dubé and Yokoo have specified appropriate predictors of the uniform and combinatorial prediction models for CSE, and proved that CSE has the asymptotic optimality for stationary binary ergodic sources [2,3]. Our previous study [4] evaluated the worst-case maximum redundancy of the modified CSE for an arbitrary binary string from the class of k-th order Markov sources.…”

mentioning

confidence: 99%

Lossless Data Compression via Substring Enumeration for k-th Order Markov Sources with a Finite Alphabet

Iwata

Arimura

2015

2015 Data Compression Conference

View full text Add to dashboard Cite

Dubé and Beaudoin have proposed a technique of lossless data compression called compression via substring enumeration (CSE) for a binary source alphabet [1]. Dubé and Yokoo [2] proved that CSE has a linear complexity both in time and in space worst-case performance for the length of string to be encoded. Dubé and Yokoo have specified appropriate predictors of the uniform and combinatorial prediction models for CSE, and proved that CSE has the asymptotic optimality for stationary binary ergodic sources [2,3]. Our previous study [4] evaluated the worst-case maximum redundancy of the modified CSE for an arbitrary binary string from the class of k-th order Markov sources. We propose a generalization of CSE for k-th order Markov sources with a finite alphabet X based on Ota and Morita [5] in this study. Consequently, we analyze the worst-case maximum redundancy of CSE for k-th order Markov sources with a finite alphabet. We also clarify that the compression ratio of CSE asymptotically converges to the optimal one with rate ( |X | k (|X |−1)+1 ) log n n for k-th Markov sources, if the length n of a source string tends to infinity, where |X | denotes the cardinality of X .

show abstract

Asymptotic Optimal Lossless Compression via the CSE Technique

Cited by 15 publications

References 9 publications

A Universal Two-Dimensional Source Coding by Means of Subblock Enumeration

A Universal Two-Dimensional Source Coding by Means of Subblock Enumeration

Two-dimensional source coding by means of subblock enumeration

Lossless Data Compression via Substring Enumeration for k-th Order Markov Sources with a Finite Alphabet

Contact Info

Product

Resources

About