2020
DOI: 10.48550/arxiv.2002.12538
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Explainable $k$-Means and $k$-Medians Clustering

Sanjoy Dasgupta,
Nave Frost,
Michal Moshkovitz
et al.

Abstract: Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a complicated way. To improve interpretability, we consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner. We study this problem from a theoretical viewpoint, measuring cluster quality by the k… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
23
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(23 citation statements)
references
References 22 publications
0
23
0
Order By: Relevance
“…We address the challenge of obtaining a low cost k-means clustering using a small decision tree. Our approach has roots in previous works on clustering with unsupervised decision trees [7,11,16,22,23,28,43,61] and in prior literature on extending decision trees for tasks beyond classification [29,30,35,45,60,54,55].…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…We address the challenge of obtaining a low cost k-means clustering using a small decision tree. Our approach has roots in previous works on clustering with unsupervised decision trees [7,11,16,22,23,28,43,61] and in prior literature on extending decision trees for tasks beyond classification [29,30,35,45,60,54,55].…”
Section: Related Workmentioning
confidence: 99%
“…Recent work on explainable clustering goes one step further by enforcing that the clustering be derived from a binary threshold tree [11,18,22,28,32,43]. Each node is associated with a feature-threshold pair that recursively splits the dataset, and labels on the leaves correspond to clusters.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations