Abstract. This paper deals with transportation polytopes in the probability simplex (that is, sets of categorical bivariate probability distributions with prescribed marginals). Information projections between such polytopes are studied, and a sufficient condition is described under which these mappings are homeomorphisms.
PreliminariesLet Γ n denote the set of probability distributions with alphabet {1, . . . , n}:The support of a probability distribution P = (p i ) is denoted by supp(P ) = {i : p i > 0}, and its size by | supp(P )|. The support of a set P of probability distributions is defined as supp(P) = P ∈P supp(P ). If P is convex, then there must exist P ∈ P with supp(P ) = supp(P). We will also write P (i) for the masses of P .Let C(P, Q) denote the set of all bivariate probability distributions with marginals P ∈ Γ n and Q ∈ Γ m :Such sets are special cases of the so-called transportation polytopes, and have been studied extensively in probability, statistics, geometry, combinatorics, etc. (see, e.g., [2,14]). In informationtheoretic approaches to statistics, and in particular to the analysis of (multidimensional) contingency tables, a basic role is played by the so-called information projections, see [7] and the references therein. This motivates our study, presented in this note, of some formal properties of information projections (I-projections for short) over domains of the form C(P, Q). I-projections onto C(P, Q) also arise in binary hypothesis testing, see [13]. Further information-theoretic results (in a fairly different direction) regarding transportation polytopes can be found in [12]. Relative entropy (information divergence, Kullback-Leibler divergence) of the distribution P with respect to the distribution Q is defined by:with the conventions 0 log 0 q = 0 and p log p 0 = ∞ for every q ≥ 0, p > 0, being understood. The functional D is nonnegative, equals zero if and only if P = Q, and is jointly convex in its arguments [6].For a probability distribution S and a set of distributions T , the I-projection [3,4,5,15,8] of S onto T is defined as the unique minimizer (if it exists) of the functional D(T ||S) over all T ∈ T . We shall study here I-projections as mappings between sets of the form C(P, Q). Namely, let I proj : C(P 1 , Q 1 ) → C(P 2 , Q 2 ) be defined by:(1.4) I proj (S) = arg inf T ∈C(P2,Q2)
D(T ||S).(Above and in the sequel we assume that P 1 , P 2 ∈ Γ n and Q 1 , Q 2 ∈ Γ m .) The definition is slightly imprecise in that I proj (S) can be undefined for some S ∈ C(P 1 , Q 1 ), i.e., the domain of I proj can