Uniform Convergence Rates for Maximum Likelihood Estimation under Two-Component Gaussian Mixture Models

Manole, Tudor; Ho, Nhat

doi:10.48550/arxiv.2006.00704

Cited by 3 publications

(5 citation statements)

References 21 publications

(50 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that this approach crucially requires parameter estimation results for the corresponding family of mixtures which may be unavailable. To the best of our knowledge, most constructive sample complexity guarantees for parameter estimation in mixture models without separability assumptions correspond to mixtures of Gaussians [Kalai et al, 2010, Belkin and Sinha, 2010, Hardt and Price, 2015, Feller et al, 2016, Ho and Nguyen, 2016, Manole and Ho, 2020, Heinrich and Kahn, 2018. Moreover, most known results correspond to mixtures of Gaussians with two components.…”

Section: Discussion On Our Results and Other Related Workmentioning

confidence: 99%

On Learning Mixture Models with Sparse Parameters

Mazumdar¹,

Pal²

2022

Preprint

View full text Add to dashboard Cite

Mixture models are widely used to fit complex and multimodal datasets. In this paper we study mixtures with high dimensional sparse latent parameter vectors and consider the problem of support recovery of those vectors. While parameter learning in mixture models is wellstudied, the sparsity constraint remains relatively unexplored. Sparsity of parameter vectors is a natural constraint in variety of settings, and support recovery is a major step towards parameter estimation. We provide efficient algorithms for support recovery that have a logarithmic sample complexity dependence on the dimensionality of the latent space. Our algorithms are quite general, namely they are applicable to 1) mixtures of many different canonical distributions including Uniform, Poisson, Laplace, Gaussians, etc. 2) Mixtures of linear regressions and linear classifiers with Gaussian covariates under different assumptions on the unknown parameters. In most of these settings, our results are the first guarantees on the problem while in the rest, our results provide improvements on existing works.

show abstract

Section: Discussion On Our Results and Other Related Workmentioning

confidence: 99%

On Learning Mixture Models with Sparse Parameters

Mazumdar¹,

Pal²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Finally, we derived both pointwise and uniform convergence rates for strongly identifiable mixtures, however we restricted our analysis of location-scale Gaussian mixtures to the pointwise case. Obtaining uniform convergence rates for such models remains an important open problem, which has not been studied beyond the special case of two component models (Hardt and Price, 2015;Manole and Ho, 2020). While this setting is beyond the scope of our work, we expect that considerations about the heterogeneity of parameter estimation, similar to those studied in this paper, would arise in such models as well.…”

Section: Discussionmentioning

confidence: 96%

Refined Convergence Rates for Maximum Likelihood Estimation under Finite Mixture Models

Manole¹,

Ho²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

We revisit convergence rates for maximum likelihood estimation (MLE) under finite mixture models. The Wasserstein distance has become a standard loss function for the analysis of parameter estimation in these models, due in part to its ability to circumvent label switching and to accurately characterize the behaviour of fitted mixture components with vanishing weights. However, the Wasserstein metric is only able to capture the worst-case convergence rate among the remaining fitted mixture components. We demonstrate that when the log-likelihood function is penalized to discourage vanishing mixing weights, stronger loss functions can be derived to resolve this shortcoming of the Wasserstein distance. These new loss functions accurately capture the heterogeneity in convergence rates of fitted mixture components, and we use them to sharpen existing pointwise and uniform convergence rates in various classes of mixture models. In particular, these results imply that a subset of the components of the penalized MLE typically converge significantly faster than could have been anticipated from past work. We further show that some of these conclusions extend to the traditional MLE. Our theoretical findings are supported by a simulation study to illustrate these improved convergence rates.

show abstract

“…The TV distance between Gaussian mixtures with two components for the special case of d = 1 has been recently studied in the context of parameter estimation [16,19,27,18]. The TV distance guarantees in these papers are more general in the sense that they do not need the component covariances to be same.…”

Section: Related Workmentioning

confidence: 99%

“…( 2) above. In [27,19], the authors show that ||f − f || TV = Ω(u 4 ); see, e.g., Eq. (2.7) in [27].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Lower Bounds on the Total Variation Distance Between Mixtures of Two Gaussians

Davies¹,

Mazumdar²,

Pal³

et al. 2021

Preprint

View full text Add to dashboard Cite

Mixtures of high dimensional Gaussian distributions have been studied extensively in statistics and learning theory. While the total variation distance appears naturally in the sample complexity of distribution learning, it is analytically difficult to obtain tight lower bounds for mixtures. Exploiting a connection between total variation distance and the characteristic function of the mixture, we provide fairly tight functional approximations. This enables us to derive new lower bounds on the total variation distance between pairs of two-component Gaussian mixtures that have a shared covariance matrix. IntroductionLet N (µ, Σ) denote the d-dimensional Gaussian distribution with mean µ ∈ R d and positive definite covariance matrix, where w i ∈ R + with k i=1 w i = 1 are the mixing weights, µ i ∈ R d are the means, and Σ i ∈ R d×d are the covariance matrices. Mixtures of Gaussian distributions have been studied intensively due to broad applicability to statistical problems [2,10,11,21,22,28,29,31,32].The variational distance (a.k.a., the total variation (TV) distance) between two distributions f, f with same sample space Ω and sigma algebra S is defined as follows:

show abstract

Uniform Convergence Rates for Maximum Likelihood Estimation under Two-Component Gaussian Mixture Models

Cited by 3 publications

References 21 publications

On Learning Mixture Models with Sparse Parameters

On Learning Mixture Models with Sparse Parameters

Refined Convergence Rates for Maximum Likelihood Estimation under Finite Mixture Models

Lower Bounds on the Total Variation Distance Between Mixtures of Two Gaussians

Contact Info

Product

Resources

About