2021
DOI: 10.1016/j.acha.2020.03.002
|View full text |Cite
|
Sign up to set email alerts
|

Diffusion K-means clustering on manifolds: Provable exact recovery via semidefinite relaxations

Abstract: We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. Thus the diffusion K-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometr… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(6 citation statements)
references
References 34 publications
0
6
0
Order By: Relevance
“…Many existing methods mimic diffusion or use Monte Carlo for information processing 35 , 36 . Indeed, the label propagation in Forest Fire Clustering represents a simplified version of diffusion and is similar to region-growing algorithms for image processing.…”
Section: Discussionmentioning
confidence: 99%
“…Many existing methods mimic diffusion or use Monte Carlo for information processing 35 , 36 . Indeed, the label propagation in Forest Fire Clustering represents a simplified version of diffusion and is similar to region-growing algorithms for image processing.…”
Section: Discussionmentioning
confidence: 99%
“…88 Chen and Yang introduced diffusion k -means, which maximizes the within-cluster connectedness based on the diffusion distance. 61 The diffusion distance is defined as the Euclidean distance in the eigenvector space of diffusion map embedding. 60,72,73 In other words, diffusion k -means is k -means clustering applied to diffusion map embeddings.…”
Section: Methodsmentioning
confidence: 99%
“…Our analysis uses a permutation invariant pairwise metric that we supply as the kernel to perform nonlinear dimensionality reduction using diffusion maps. 60 We use diffusion k -means 61 to define the microstate clustering, and robust Perron cluster cluster analysis (PCCA+) 62–66 to define the macrostate clustering and build MSMs. We test the Markovianity of the macrostate MSMs using the Chapman–Kolmogorov (CK) test to verify that they are valid kinetic models of the non-equilibrium OM system and provide post hoc support for the use of permutationally-invariant diffusion map embeddings to identify and resolve microstates.…”
Section: Introductionmentioning
confidence: 99%
“…Fast approximation algorithms to solve the K-means such as Lloyd's algorithm [19,20] and spectral clustering [22,26,31,2,32,33] provably yield consistent recovery when different groups are well separated. Recently, semi-definite programming (SDP) relaxations [27,23,18,13,11,29,14,9] have emerged as an important approach for clustering due to its superior empirical performance [27], robustness against outliers and adversarial attack [13], and attainment of the information-theoretic limit [10]. Despite having polynomial time complexity, the SDP relaxed Kmeans has notoriously poor scalability to large (or even moderate) datasets for instance by interior point methods [3,15].…”
Section: Introductionmentioning
confidence: 99%