2019
DOI: 10.48550/arxiv.1911.00538
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Optimality of Spectral Clustering in the Gaussian Mixture Model

Abstract: Spectral clustering is one of the most popular algorithms to group high dimensional data. It is easy to implement and computationally efficient. Despite its popularity and successful applications, its theoretical properties have not been fully understood. The spectral clustering algorithm is often used as a consistent initializer for more sophisticated clustering algorithms. However, in this paper, we show that spectral clustering is actually already optimal in the Gaussian Mixture Model, when the number of cl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
29
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(31 citation statements)
references
References 37 publications
2
29
0
Order By: Relevance
“…It is easy to verify that the rank of M k (S) is 1, which is smaller than 2, the number of cluster at mode k. The above example has non-zero separation ∆ 2 min (S) = 16, so that the two clusters on each mode are still identifiable. Similar phenomenons also appear in matrix biclustering and even vector clustering (Löffler et al, 2019). The vanishing singular value gap impede the use of classical matrix/tensor perturbation theory and make the analysis more difficult.…”
Section: Assumptionsmentioning
confidence: 79%
See 2 more Smart Citations
“…It is easy to verify that the rank of M k (S) is 1, which is smaller than 2, the number of cluster at mode k. The above example has non-zero separation ∆ 2 min (S) = 16, so that the two clusters on each mode are still identifiable. Similar phenomenons also appear in matrix biclustering and even vector clustering (Löffler et al, 2019). The vanishing singular value gap impede the use of classical matrix/tensor perturbation theory and make the analysis more difficult.…”
Section: Assumptionsmentioning
confidence: 79%
“…Classic clustering algorithms such as k-means (Jain, 2010) and spectral clustering (Von Luxburg, 2007) have been widely used in statistics and machine learning. In the order-1 (vector) case, the clustering problem reduces to clustering in Gaussian mixture model, and optimal statistical guarantee have been developed for the state-of-art clustering algorithms, including spectral clustering (Löffler et al, 2019), EM algorithm (Wu and Zhou, 2019) and Lloyd algorithm (Lu and Zhou, 2016). In the order-2 (matrix) case, clustering methods have been studied under the stochastic block model (Abbe, 2017), biclustering (Gao et al, 2016), and bipartite community detection (Zhou and Amini, 2019).…”
Section: Related Literaturementioning
confidence: 99%
See 1 more Smart Citation
“…As a result, the constructed confidence regions CR 1−α U,l taken together form a valid set that contains a ground-truth subspace representation in a row-wise valid fashion. There are many applications -e.g., community detection (Rohe et al, 2011), Gaussian mixture models (Löffler et al, 2019) -in which the rows of U might contain crucial operational information. When r = 1, the above theorem provides entrywise confidence intervals for the principal component of interest.…”
Section: Inferential Procedures and Theory For The Svd-based Approachmentioning
confidence: 99%
“…A variety of large-scale data science applications involve extracting actionable knowledge from the eigenvectors of a certain low-rank matrix. Representative examples include principal component analysis (PCA) [Johnstone, 2001], phase synchronization [Singer, 2011], clustering in mixture models [Löffler et al, 2019], community recovery [Abbe et al, 2020b, Lei et al, 2015, to name just a few. In reality, it is often the case that one only observes a randomly corrupted version of the matrix of interest, and has to retrieve information from the "empirical" eigenvectors (i.e., the eigenvectors of the observed noisy matrix).…”
mentioning
confidence: 99%