2021
DOI: 10.48550/arxiv.2110.15073
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MMD Aggregated Two-Sample Test

Abstract: We propose a novel nonparametric two-sample test based on the Maximum Mean Discrepancy (MMD), which is constructed by aggregating tests with different kernel bandwidths. This aggregation procedure, called MMDAgg, ensures that test power is maximised over the collection of kernels used, without requiring held-out data for kernel selection (which results in a loss of test power), or arbitrary kernel choices such as the median heuristic. We work in the non-asymptotic framework, and prove that our aggregated test … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 9 publications
(29 citation statements)
references
References 18 publications
0
29
0
Order By: Relevance
“…Conversely, the wild bootstrap has the advantage of not requiring to sample from p, which makes it computationally more efficient as only one kernel matrix needs to be computed, but it only achieves the desired level α asymptotically (Shao, 2010;Leucht & Neumann, 2013;Chwialkowski et al, 2016;2014). Note that we cannot obtain a non-asymptotic level for the wild bootstrap by relying on the result of Romano & Wolf (2005, Lemma 1) as done in the two-sample framework by Fromont et al (2013) and Schrab et al (2021). This is because in our case Kk and KSD 2 p,k (X N ) are not exchangeable variables under the null hypothesis, due to the asymmetry of the KSD statistic with respect to p and q.…”
Section: Single Testmentioning
confidence: 88%
See 2 more Smart Citations
“…Conversely, the wild bootstrap has the advantage of not requiring to sample from p, which makes it computationally more efficient as only one kernel matrix needs to be computed, but it only achieves the desired level α asymptotically (Shao, 2010;Leucht & Neumann, 2013;Chwialkowski et al, 2016;2014). Note that we cannot obtain a non-asymptotic level for the wild bootstrap by relying on the result of Romano & Wolf (2005, Lemma 1) as done in the two-sample framework by Fromont et al (2013) and Schrab et al (2021). This is because in our case Kk and KSD 2 p,k (X N ) are not exchangeable variables under the null hypothesis, due to the asymmetry of the KSD statistic with respect to p and q.…”
Section: Single Testmentioning
confidence: 88%
“…In this work, we focus on aggregated tests, which have been investigated for the two-sample problem by Fromont et al (2013), Kim et al (2020) and Schrab et al (2021) using the Maximum Mean Discrepancy (MMD, Gretton et al, 2012a) and for the independence problem by Albert et al (2019) and Kim et al (2020) using the Hilbert Schmidt Independence Criterion (HSIC, Gretton et al, 2005). We extend the use of aggregated tests to the goodness-of-fit setting, where we are given a model and some samples, and we are interested in deciding whether the samples have been drawn from the model.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The recent work of [14] and [19] in two-sample testing shares a similar goal of devising nonparametric tests for independence of distribution between two sets. Rather than design and optimize specific kernels for use with Maximum Mean Discrepancy-based tests, this work focuses on a class of tests amenable to many distance metrics and system performance criteria, that are also practical to implement and easy to evaluate.…”
Section: Related Workmentioning
confidence: 99%
“…As noted in Lehmann and Romano (2006), the difference between p perm and its Monte Carlo approximation can be made arbitrarily small by taking a sufficiently large number of Monte Carlo samples. This can be formally stated using Dvoretzky-Kiefer-Wolfowitz inequality (Dvoretzky et al, 1956) and we refer to Corollary 6.1 of Kim (2021) or Proposition 4 of Schrab et al (2021) for such argument.…”
Section: Algorithm 1 Local Permutation Proceduresmentioning
confidence: 99%