Inequalities which connect information divergence with other measures of discrimination or distance between probability distributions are used in information theory and its applications to mathematical statistics, ergodic theory and other scientific fields. We suggest new inequalities of this type, often based on underlying identities. As a consequence we obtain certain improvements of the well known Pinsker inequality. Our study depends on two measures of discrimination, called capacitory discrimination and triangular discrimination. The discussion contains references to related research and comparison with other measures of discrimination, e.g. Ali-Silvey-Csiszár divergences and, in particular, the Hellinger distance.
In its modern formulation, the Maximum Entropy Principle was promoted by E.T. Jaynes, starting in the mid-fifties. The principle dictates that one should look for a distribution, consistent with available information, which maximizes the entropy. However, this principle focuses only on distributions and it appears advantageous to bring information theoretical thinking more prominently into play by also focusing on the "observer" and on coding. This view was brought forward by the second named author in the late seventies and is the view we will follow-up on here. It leads to the consideration of a certain game, the Code Length Game and, via standard game theoretical thinking, to a principle of Game Theoretical Equilibrium. This principle is more basic than the Maximum Entropy Principle in the sense that the search for one type of optimal strategies in the Code Length Game translates directly into the search for distributions with maximum entropy. In the present paper we offer a self-contained and comprehensive treatment of fundamentals of both principles mentioned, based on a study of the Code Length Game. Though new concepts and results are presented, the reading should be instructional and accessible to a rather wide audience, at least if certain mathematical details are left aside at a rst reading. The most frequently studied instance of entropy maximization pertains to the Mean Energy Model which involves a moment constraint related to a given function, here taken to represent "energy". This type of application is very well known from the literature with hundreds of applications pertaining to several different elds and will also here serve as important illustration of the theory. But our approach reaches further, especially regarding the study of continuity properties of the entropy function, and this leads to new results which allow a discussion of models with so-called entropy loss. These results have tempted us to speculate over the development of natural languages. In fact, we are able to relate our theoretical findings to the empirically found Zipf's law which involves statistical aspects of words in a language. The apparent irregularity inherent in models with entropy loss turns out to imply desirable stability properties of languages
Let V and D denote, respectively, total variation and divergence. We study lower bounds of D with V fixed. The theoretically best (i.e. largest) lower bound determines a function L = L(V), Vajda's tight lower bound, cf. Vajda, [?]. The main result is an exact parametrization of L. This leads to Taylor polynomials which are lower bounds for L, and thereby extensions of the classical Pinsker inequality which has numerous applications, cf.Pinsker, [?] and followers.
To any discrete probability distribution P we can associate its entropy H(P ) = − p i ln p i and its index of coincidence IC(P ) = p 2 i . The main result of the paper is the determination of the precise range of the map P (IC(P ), H(P )). The range looks much like that of the map P (P max , H(P )) where P max is the maximal point probability, cf. research from 1965 (Kovalevskij [18]) to 1994 (Feder and Merhav [7]). The earlier results, which actually focus on the probability of error 1 − P max rather than P max , can be conceived as limiting cases of results obtained by methods here presented. Ranges of maps as those indicated are called Information Diagrams.The main result gives rise to precise lower as well as upper bounds for the entropy function. Some of these bounds are essential for the exact solution of certain problems of universal coding and prediction for Bernoulli sources. Other applications concern Shannon theory (relations betweeen various measures of divergence), statistical decision theory and rate distortion theory.Two methods are developed. One is topological, another involves convex analysis and is based on a "lemma of replacement" which is of independent interest in relation to problems of optimization of mixed type (concave/convex optimization).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.