Hidden Integrality and Semirandom Robustness of SDP Relaxation for Sub-Gaussian Mixture Model

Fei, Yingjie; Chen, Yudong

doi:10.1287/moor.2021.1216

Cited by 2 publications

(3 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Existing algorithms [71,24,85,75,34,46,23 Table 1: Comparison of sample complexity, misclassification rate, and computational complexity under different signal strength assumptions. Since SNR ≥ S, the bounds in the second column imply those in the third column.…”

Section: Algorithm Sample Complexitymentioning

confidence: 99%

See 1 more Smart Citation

Clustering a Mixture of Gaussians with Unknown Covariance

Davis

Díaz

Wang

2021

Preprint

View full text Add to dashboard Cite

We investigate a clustering problem with data from a mixture of Gaussians that share a common but unknown, and potentially ill-conditioned, covariance matrix. We start by considering Gaussian mixtures with two equally-sized components and derive a Max-Cut integer program based on maximum likelihood estimation. We prove its solutions achieve the optimal misclassification rate when the number of samples grows linearly in the dimension, up to a logarithmic factor. However, solving the Max-cut problem appears to be computationally intractable. To overcome this, we develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size. Although this sample complexity is worse than that of the Max-cut problem, we conjecture that no polynomial-time method can perform better. Furthermore, we gather numerical and theoretical evidence that supports the existence of a statistical-computational gap. Finally, we generalize the Max-Cut program to a k-means program that handles multi-component mixtures with possibly unequal weights. It enjoys similar optimality guarantees for mixtures of distributions that satisfy a transportation-cost inequality, encompassing Gaussian and strongly log-concave distributions.

show abstract

Section: Algorithm Sample Complexitymentioning

confidence: 99%

“…In particular, when S 1 and n = Ω(d), it is known that Lloyd's algorithm [71,24], semi-definite relaxations of k-means [85,75,34,46,23] and spectral algorithms [1] achieve an error rate of e −Ω(S) . This rate depends suboptimally on SNR.…”

Section: Introductionmentioning

confidence: 99%

Clustering a Mixture of Gaussians with Unknown Covariance

Davis

Díaz

Wang

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…signed clustering [18]. Note further that the problem of recovering well separated clusters has been investigated through the lens of semidefinite programming [19][20][21][22][23], which gives an error bound on the recovered cluster matrix or even better proves exact recovery of the cluster labels (up to label permutations). It has also been addressed using non-negative matrix factorisation [24], which gives a probably approximately correct (PAC) Bayesian approach to the problem.…”

Section: Challenges and Achieved Resultsmentioning

confidence: 99%

Clustering on Laplacian-embedded latent manifolds when clusters overlap

Chrétien

Jagan

Barton

2020

Meas. Sci. Technol.

View full text Add to dashboard Cite

The purpose of clustering is to identify groups in a dataset in the hope of revealing some unforeseen latent discrete variable. Clustering however, is known to be one of the most difficult tasks in practice for two main reasons: (i) high dimensionality of data which requires appropriate feature extraction; and (ii) computational complexity of the associated optimisation problems.Spectral clustering was designed as a joint embedding and clustering technique that first embeds the data into a low dimensional space and then delineates between the clusters by considering the sign of the components of (a linearly transformed version of) the second eigenvector of a similarity matrix. Hence, spectral clustering seemingly solves the two main challenges associated with clustering problems, at least when the clusters are well separated.In this paper, we address the question of clustering when clusters overlap. In this regime, one relevant approach to clustering is to consider the modes of the point cloud distribution and, in particular, how the modes of the distribution of the raw data are mapped to the modes of the embedded data via the Laplacian eigenmap.The main contribution of the present paper is to provide a simulation study of the (approximate) mode-preserving property of Laplacian eigenmaps and how relevant the mode chasing approach is to high dimensional clustering. As a consequence of the mode-preserving property of Laplacian eigenmaps, this method is as good as finding modes in the original high dimensional space but with much better computational efficiency. The method is illustrated on simulated data and on satellite data relating to ground movement.

show abstract

Hidden Integrality and Semirandom Robustness of SDP Relaxation for Sub-Gaussian Mixture Model

Cited by 2 publications

References 23 publications

Clustering a Mixture of Gaussians with Unknown Covariance

Clustering a Mixture of Gaussians with Unknown Covariance

Clustering on Laplacian-embedded latent manifolds when clusters overlap

Contact Info

Product

Resources

About