Proceedings of the 11th ACM Conference on Computing Frontiers 2014
DOI: 10.1145/2597917.2597918
|View full text |Cite
|
Sign up to set email alerts
|

A collaborative divide-and-conquer K-means clustering algorithm for processing large data

Abstract: K-means clustering plays a vital role in data mining. As an iterative computation, its performance will suffer when applied to tremendous amounts of data, due to poor temporal locality across its iterations. The state-of-the-art streaming algorithm, which streams the data from disk into memory and operates on the partitioned streams, improves temporal locality but can misplace objects in clusters since different partitions are processed locally. This paper presents a collaborative divide-and-conquer algorithm … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 28 publications
0
9
0
Order By: Relevance
“…More recently, [63] proposed a new efficient way to deal with large distributed datasets. The method is based on a collaborative divide-and-conquer algorithm using k-means as base clustering algorithm.…”
Section: Distributed Datamentioning
confidence: 99%
“…More recently, [63] proposed a new efficient way to deal with large distributed datasets. The method is based on a collaborative divide-and-conquer algorithm using k-means as base clustering algorithm.…”
Section: Distributed Datamentioning
confidence: 99%
“…Kmeans partitions a number of objects into k clusters such that similar objects belong to the same cluster [12].…”
Section: Platform and Benchmarkmentioning
confidence: 99%
“…The scores in the rating matrix represent the significant features for the users and items, but the rating matrix commonly consists of unknown rating scores (data sparsity) which lower the quality of the predicted scores' accuracy. However, during the streaming of rating scores into the rating matrix, some rating scores deviate from its accurate places (Cui et al, 2014). Usually, the deviation is caused by the streaming of the huge amount of rating scores in the rating matrix without care for sorting and managing these scores to extract the accurate latent feedback.…”
Section: Introductionmentioning
confidence: 99%
“…The DFC algorithm randomly divides the large-scale matrix factorization task into smaller sub-problems and solve those subproblems in parallel and then combine them using ensemble methods based on low-rank approximations (Mackey et al, 2011). Cui et al (2014) have proposed the state-of-the-art divide and conquer k-means clustering algorithm to reduce the imprecision in rearranging the streaming data. Mackey et al (2011) have rearranged the matrix factorization based on the ensemble method and Cui et al (2014) have identified the data places based on the clustering method and its relations.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation