2017 IEEE 33rd International Conference on Data Engineering (ICDE) 2017
DOI: 10.1109/icde.2017.102
|View full text |Cite
|
Sign up to set email alerts
|

Streaming k-Means Clustering with Fast Queries

Abstract: We present methods for k-means clustering on a stream with a focus on providing fast responses to clustering queries. Compared to the current state-of-the-art, our methods provide substantial improvement in the query time for cluster centers while retaining the desirable properties of provably small approximation error and low space usage. Our algorithms rely on a novel idea of "coreset caching" that systematically reuses coresets (summaries of data) computed for recent queries in answering the current cluster… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 14 publications
0
7
0
Order By: Relevance
“…Clustering topics involves grouping similar topics into a set known as a cluster. The idea is that topics in one cluster are likely to be different compared to topics grouped under another cluster [19]. In other words,, topics in one cluster are more co-related than those in another.…”
Section: Related Workmentioning
confidence: 99%
“…Clustering topics involves grouping similar topics into a set known as a cluster. The idea is that topics in one cluster are likely to be different compared to topics grouped under another cluster [19]. In other words,, topics in one cluster are more co-related than those in another.…”
Section: Related Workmentioning
confidence: 99%
“…We assume a general dynamic setting, where new objects can be inserted into a system and arbitrary objects can be removed from the system. A dataset update corresponds to an object insertion or deletion, and it comes one by one [17], [19], [22]. Let P be a set of objects generated so far.…”
Section: Problem Definitionmentioning
confidence: 99%
“…It is hence desirable to specify ρ min , and δ min on-demand. This on-demand clustering query for dynamic data is common in the literature, e.g., [17], [20], [22], [23]. Once cluster centers are determined, cluster label propagation is done in Θ(n) time by traversing dependent objects from the cluster centers in a breadth-first search manner.…”
Section: Problem Definitionmentioning
confidence: 99%
“…This is why approximation and data reduction techniques are popular choices for accelerating existing clustering algorithms. In fact, the paradigm of coresets has seen great success in the task of approximating solutions for clustering problems (see [10], [12], [4], [22] and [1]). The sensitivity framework, originally proposed for constructing coresets for clustering problems, requires a sub-optimal clustering of the input data D in order to compute the sensitivity for each point in D. This requirement transfers to CABLR, described in the previous section.…”
Section: Challenges and Problemsmentioning
confidence: 99%