2013
DOI: 10.1007/978-3-642-40450-4_41
|View full text |Cite
|
Sign up to set email alerts
|

BICO: BIRCH Meets Coresets for k-Means Clustering

Abstract: We design a data stream algorithm for the k-means problem, called BICO, that combines the data structure of the SIGMOD Test of Time award winning algorithm BIRCH [27] with the theoretical concept of coresets for clustering problems. The k-means problem asks for a set C of k centers minimizing the sum of the squared distances from every point in a set P to its nearest center in C. In a data stream, the points arrive one by one in arbitrary order and there is limited storage space.BICO computes high quality solu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
47
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 52 publications
(47 citation statements)
references
References 23 publications
0
47
0
Order By: Relevance
“…A natural adaptation of these agglomerative methods is the mini-batch version that PERCH outperforms in our empirical evaluation. BIRCH [38] and its extensions [19] comprise state of the art online hierarchical methods that, like PERCH, incrementally insert points into a cluster tree data structure. However, unlike PERCH, these methods parameterize internal nodes with means and variances as opposed to bounding boxes, and they do not implement rotations, which our empirical and theoretical results justify.…”
Section: Related Workmentioning
confidence: 99%
“…A natural adaptation of these agglomerative methods is the mini-batch version that PERCH outperforms in our empirical evaluation. BIRCH [38] and its extensions [19] comprise state of the art online hierarchical methods that, like PERCH, incrementally insert points into a cluster tree data structure. However, unlike PERCH, these methods parameterize internal nodes with means and variances as opposed to bounding boxes, and they do not implement rotations, which our empirical and theoretical results justify.…”
Section: Related Workmentioning
confidence: 99%
“…Many of these decompositions are based on packing arguments and range spaces. We will illustrate one approach via the k-means problem, based on the papers by Har-Peled and Mazumdar [58] and Fichtenberger et al [51].…”
Section: Geometric Decompositionsmentioning
confidence: 99%
“…The geometric approach was also popular when coresets were introduced to k-median and k-means clustering and generalizations, see for instance [44,51,52,57,58]. Due to the exponential dependency inherent in all known constructions, the focus later shifted to sampling.…”
Section: Theorem 1 For Any Set Of N Points a Euclidean Space There Ementioning
confidence: 99%
See 1 more Smart Citation
“…Koupaie et al [4] proposed cluster based outlier detection in data stream. Fichtenberger et al [3] proposed a data stream algorithm for the k-means problem called BICO (BIRCH Meets Core sets for k-means Clustering), that combines the data structure of the SIGMOD test of time award winning algorithm birch with the theoretical concept of corsets for clustering problems. Vijayarani and Jothi [11] two clustering algorithms namely BIRCH with k-means and Birch with CLARANS are used for clustering the data items and finding the outliers in data streams.…”
Section: Introduction and Formulation Of The Problemmentioning
confidence: 99%