Equivalence of distance-based and RKHS-based statistics in hypothesis testing

Sejdinović, Dino; Sriperumbudur, Bharath K.; Gretton, Arthur; Fukumizu, Kenji

doi:10.1214/13-aos1140

Cited by 490 publications

(504 citation statements)

References 20 publications

Supporting

Mentioning

503

Contrasting

Order By: Relevance

“…Firstly, there is no reason to stick to only the Euclidean norm · 2 to measure distances for ED-the test can be extended to other norms, and in fact also other metrics; Reference [40] explains the details for the closely related independence testing problem. Following that, Reference [41] discusses the relationship between distances and kernels (again for independence testing, but the same arguments also hold in the two-sample testing setting). Loosely speaking, for every kernel k, there exists a metric d (and also vice versa), given by d(x, y) := (k(x, x) + k(y, y))/2 − k(x, y), such that MMD with kernel k equals ED with metric d. This is a very strong connection between these two families of tests-the energy distance is a special case of the kernel MMD, corresponding to a particular choice of kernel, and the kernel MMD itself corresponds to an extremely smoothed Wasserstein distance, for a particular choice of distance.…”

Section: From Energy Distance To Kernel Maximum Mean Discrepancymentioning

confidence: 99%

On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests

Ramdas

Trillos

Cuturi

2017

Entropy

359

248

View full text Add to dashboard Cite

Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. In this short survey, we focus on test statistics that involve the Wasserstein distance. Using an entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov-Smirnov test, probability or quantile (PP/QQ) plots and receiver operating characteristic or ordinal dominance (ROC/ODC) curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing's classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.

show abstract

Section: From Energy Distance To Kernel Maximum Mean Discrepancymentioning

confidence: 99%

On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests

Ramdas

Trillos

Cuturi

2017

Entropy

359

248

View full text Add to dashboard Cite

show abstract

“…An important generalization of distance correlation is Sejdinovic et al (2013). This is related to a generalized distance correlation where the distance is a more general metric than the Euclidean one.…”

Section: New Axiomsmentioning

confidence: 99%

Four simple axioms of dependence measures

Móri

Székely

2018

Metrika

View full text Add to dashboard Cite

Recently new methods for measuring and testing dependence have appeared in the literature. One way to evaluate and compare these measures with each other and with classical ones is to consider what are reasonable and natural axioms that should hold for any measure of dependence. We propose four natural axioms for dependence measures and establish which axioms hold or fail to hold for several widely applied methods. All of the proposed axioms are satisfied by distance correlation. We prove that if a dependence measure is defined for all bounded nonconstant real valued random variables and is invariant with respect to all one-to-one measurable transformations of the real line, then the dependence measure cannot be weakly continuous. This implies that the classical maximal correlation cannot be continuous and thus its application is problematic. The recently introduced maximal information coefficient has the same disadvantage. The lack of weak continuity means that as the sample size increases the empirical values of a dependence measure do not necessarily converge to the population value.

show abstract

“…K is called the distance kernel [17]. The map ϕ : X → H k , ϕ(x) : x → K(·, x) is the canonical feature map.…”

Section: Appendix A: Proof Of Theoremmentioning

confidence: 99%

“…Energy distance may be interpreted as a special case of MMD with a particular kernel function [17], and thus our contribution in this paper can be regarded as providing a practical choice of the kernel function to the MMD-based method. Since the proposed method does not have any tuning parameter, it is extremely simple and computationally highly efficient.…”

Section: Introductionmentioning

confidence: 99%

“…Below, we propose to use the energy distance in class-prior estimation, which we will demonstrate to be practically useful in the next section. Actually, the energy distance was shown to be a special case of MMD [17], meaning that MMD with a certain choice of kernels is reduced to the energy distance. Therefore, our contribution in this paper can be regarded as providing a practical choice of the kernel function in the MMD-based method.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Computationally Efficient Class-Prior Estimation under Class Balance Change Using Energy Distance

Kawakubo

Plessis

Sugiyama

2016

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

Hideko KAWAKUBO†a) , Marthinus Christoffel DU PLESSIS † †b) , Nonmembers, and Masashi SUGIYAMA † †c) , Member SUMMARYIn many real-world classification problems, the class balance often changes between training and test datasets, due to sample selection bias or the non-stationarity of the environment. Naive classifier training under such changes of class balance systematically yields a biased solution. It is known that such a systematic bias can be corrected by weighted training according to the test class balance. However, the test class balance is often unknown in practice. In this paper, we consider a semi-supervised learning setup where labeled training samples and unlabeled test samples are available and propose a class balance estimator based on the energy distance. Through experiments, we demonstrate that the proposed method is computationally much more efficient than existing approaches, with comparable accuracy. key words: class balance change, class-prior estimation, energy distance IntroductionA fundamental assumption in supervised machine learning is that training and test data follow the same probability distribution. However, in real-world data, this assumption does not necessarily hold due to intrinsic sample selection bias and non-stationarity of the environment [1], and naive training yields a biased solution [2]. In this paper, we consider the situation called the class balance change in classification [3], where only the class-prior probabilities change between the training and test phases. In principle, the bias caused by the class balance change can be corrected by weighted training according to the class ratio of the test data. However, in practice, the test class balance is often unknown and thus it needs to be estimated from data.So far, semi-supervised class balance estimators from labeled training samples and unlabeled test samples have been developed, which are based on fitting a mixture of class-wise training input distributions to the test input distribution. A seminal method [4] adopts the expectationmaximization (EM) algorithm [5] to estimate the class ratio. Another earlier paper [3] showed that the EM-based method can be interpreted as indirectly fitting a mixture of classwise training input distributions to the test input distribu- The divergence-based methods reviewed above [3], [11] are equipped with cross-validation (CV), and therefore all tuning parameters can be objectively optimized. Thanks to this property, the divergence-based methods work very well in practice, although CV is computationally rather expensive. On the other hand, choosing a kernel function in the MMD-based method is not straightforward because changing the kernel function corresponds to changing the error metric and thus CV cannot be employed. Using the median distance of samples as the Gaussian kernel width is a popular heuristic in MMD [12], but this can cause significant performance degradation in practice [15]. Using MKL for MMD is potentially powerful, but this implementation is computationally highly...

show abstract

Equivalence of distance-based and RKHS-based statistics in hypothesis testing

Cited by 490 publications

References 20 publications

On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests

On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests

Four simple axioms of dependence measures

Computationally Efficient Class-Prior Estimation under Class Balance Change Using Energy Distance

Contact Info

Product

Resources

About