2019
DOI: 10.1002/sam.11418
|View full text |Cite
|
Sign up to set email alerts
|

Optimal transport, mean partition, and uncertainty assessment in cluster analysis

Abstract: In scientific data analysis, clusters identified computationally often substantiate existing hypotheses or motivate new ones. Yet the combinatorial nature of the clustering result, which is a partition rather than a set of parameters or a function, blurs notions of mean, and variance. This intrinsic difficulty hinders the development of methods to improve clustering by aggregation or to assess the uncertainty of clusters generated. We overcome that barrier by aligning clusters via optimal transport. Equipped w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 17 publications
(20 citation statements)
references
References 31 publications
0
20
0
Order By: Relevance
“…Thus, they share a common challenge, the lack of a gold standard to assess quality and performance, and the lack of methods for the selection of the number of groups (modules/clusters). The notion of stability has been used extensively in the clustering literature as a surrogate for performance [4, 12–14, 20, 22, 26, 38]. It is therefore natural to bridge stability estimation for clustering to the module detection problem in graphs.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, they share a common challenge, the lack of a gold standard to assess quality and performance, and the lack of methods for the selection of the number of groups (modules/clusters). The notion of stability has been used extensively in the clustering literature as a surrogate for performance [4, 12–14, 20, 22, 26, 38]. It is therefore natural to bridge stability estimation for clustering to the module detection problem in graphs.…”
Section: Discussionmentioning
confidence: 99%
“…In lieu of a gold standard, different forms of cluster stability have been used as a surrogate to assess performance. Stability estimates capture how stable the clusterings are over several different representations of the data, which are derived through subsetting, cross‐validation, data noising or re‐sampling, among others [4, 12, 14, 15, 20, 22, 26, 36–38].…”
Section: Introductionmentioning
confidence: 99%
“…The iris data was examined using the OTclust package (Li et al, 2019; Zhang et al, 2020). Figure 4b shows the stability for k ‐means as a measure of overall tightness across a range of k values.…”
Section: Approaches To Clustering Stabilitymentioning
confidence: 99%
“…For example, in unsupervised clustering, cluster labels are named arbitrarily only as symbols to distinguish groups. Since clusters generated in multiple results usually do not correspond to each other sharply, OT is used to match clusters in different results [25]. It is sometimes improper to assume that the clustering results are random realizations of one underlying "truth".…”
Section: Optimal Transport With Relaxed Marginal Constraintsmentioning
confidence: 99%
“…Wasserstein distance has also been used for robust supervised learning [22]- [24]. In the case of unsupervised learning, OT readily applies to the issue of aligning clustering results (the consistent cluster labeling issue), which then forms the basis for ensemble clustering and uncertainty analysis for clustering [25], [26].…”
Section: Introductionmentioning
confidence: 99%