In this paper we consider robust parameter estimation based on a certain cross entropy and divergence. The robust estimate is defined as the minimizer of the empirically estimated cross entropy. It is shown that the robust estimate can be regarded as a kind of projection from the viewpoint of a Pythagorean relation based on the divergence. This property implies that the bias caused by outliers can become sufficiently small even in the case of heavy contamination. It is seen that the asymptotic variance of the robust estimator is naturally overweighted in proportion to the ratio of contamination. One may surmise that another form of cross entropy can present the same behavior as that discussed above. It can be proved under some conditions that no cross entropy can present the same behavior except for the cross entropy considered here and its monotone transformation.
We aim at an extension of AdaBoost to U-Boost, in the paradigm to build a stronger classification machine from a set of weak learning machines. A geometric understanding of the Bregman divergence defined by a generic convex function U leads to the U-Boost method in the framework of information geometry extended to the space of the finite measures over a label set. We propose two versions of U-Boost learning algorithms by taking account of whether the domain is restricted to the space of probability functions. In the sequential step, we observe that the two adjacent and the initial classifiers are associated with a right triangle in the scale via the Bregman divergence, called the Pythagorean relation. This leads to a mild convergence property of the U-Boost algorithm as seen in the expectation-maximization algorithm. Statistical discussions for consistency and robustness elucidate the properties of the U-Boost methods based on a stochastic assumption for training data.
Kullback-Leibler divergence and the Neyman-Pearson lemma are two fundamental concepts in statistics. Both are about likelihood ratios: Kullback-Leibler divergence is the expected log-likelihood ratio, and the Neyman-Pearson lemma is about error rates of likelihood ratio tests. Exploring this connection gives another statistical interpretation of the Kullback-Leibler divergence in terms of the loss of power of the likelihood ratio test when the wrong distribution is used for one of the hypotheses. In this interpretation, the standard non-negativity property of the Kullback-Leibler divergence is essentially a restatement of the optimal property of likelihood ratios established by the Neyman-Pearson lemma. The asymmetry of Kullback-Leibler divergence is overviewed in information geometry.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.