Proceedings of the 2012 SIAM International Conference on Data Mining 2012
DOI: 10.1137/1.9781611972825.8
|View full text |Cite
|
Sign up to set email alerts
|

Cluster-Aware Compression with Provable K-means Preservation

Abstract: This work rigorously explores the design of clusterpreserving compression schemes for high-dimensional data. We focus on the K-means algorithm and identify conditions under which running the algorithm on the compressed data yields the same clustering outcome as on the original. The compression is performed using single and multi-bit minimum mean square error quantization schemes as well as a given clustering assignment of the original data. We provide theoretical guarantees on post-quantization cluster preserv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2014
2014
2014
2014

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…Similar, in spirit, to our approach is the work of [42]. There, the authors propose 1-bit Minimum Mean…”
Section: K-means Clustering In the Compressed Domainmentioning
confidence: 86%
See 3 more Smart Citations
“…Similar, in spirit, to our approach is the work of [42]. There, the authors propose 1-bit Minimum Mean…”
Section: K-means Clustering In the Compressed Domainmentioning
confidence: 86%
“…k-Clustering Problem: Given a DB containing V compressed representations of x (i) , ∀i, and a target number of clusters k, group the compressed data into k clusters in an accurate way through their compressed representations. This is an assignment problem and is in fact NPhard [42]. Many approximations to this problem exist, one of the most widely-used algorithms being the k-Means clustering algorith [43].…”
Section: K-means Clustering In the Compressed Domainmentioning
confidence: 99%
See 2 more Smart Citations