This paper describes universal lossless coding strategies for compressing sources on countably infinite alphabets.Classes of memoryless sources defined by an envelope condition on the marginal distribution provide benchmarks for coding techniques originating from the theory of universal coding over finite alphabets. We prove general upperbounds on minimax regret and lower-bounds on minimax redundancy for such source classes. The general upper bounds emphasize the role of the Normalized Maximum Likelihood codes with respect to minimax regret in the infinite alphabet context. Lower bounds are derived by tailoring sharp bounds on the redundancy of Krichevsky-Trofimov coders for sources over finite alphabets. Up to logarithmic (resp. constant) factors the bounds are matching for source classes defined by algebraically declining (resp. exponentially vanishing) envelopes. Effective and (almost) adaptive coding techniques are described for the collection of source classes defined by algebraically vanishing envelopes. Those results extend our knowledge concerning universal coding to contexts where the key tools from parametric inference are known to fail. keywords: NML; countable alphabets; redundancy; adaptive compression; minimax;
I. INTRODUCTIONThis paper is concerned with the problem of universal coding on a countably infinite alphabet X (say the set of positive integers N + or the set of integers N ) as described for example by .Throughout this paper, a source on the countable alphabet X is a probability distribution on the set X N of infinite sequences of symbols from X (this set is endowed with the σ-algebra generated by sets of the form n i=1 {x i }×X N where all x i ∈ X and n ∈ N). The symbol Λ will be used to denote various classes of sources on the countably infinite alphabet X . The sequence of symbols emitted by a source is denoted by the X AE -valued random variable X = (X n ) n∈N . If P denotes the distribution of X, P n denotes the distribution of X 1:n = X 1 , ..., X n , and we let Λ n = {P n : P ∈ Λ}. For any countable set X , let M 1 (X ) be the set of all probability measures on X .From Shannon noiseless coding Theorem (see Cover and Thomas, 1991), the binary entropy of P n , H(X 1:n ) = E P n [− log P (X 1:n )] provides a tight lower bound on the expected number of binary symbols needed to encode outcomes of P n . Throughout the paper, logarithms are in base 2. In the following, we shall only consider finite entropy sources on countable alphabets, and we implicitly assume that H(X 1:n ) < ∞. The expected redundancy of any distribution Q n ∈ M 1 (X n ), defined as the difference between the expected code length E P [− log Q n (X 1:n )] and H(X 1:n ), is equal to the Kullback-Leibler divergence (or relative entropy) D(P n , Q n ) = x∈X n P n {x} log P n (x) Q n (x) = P n log P n (X1:n) Q n (X1:n) . Universal coding attempts to develop sequences of coding probabilities (Q n ) n so as to minimize expected redundancy over a whole class of sources. Technically speaking, several distinct notions of universa...