2014 IEEE 5th International Conference on Software Engineering and Service Science 2014
DOI: 10.1109/icsess.2014.6933652
|View full text |Cite
|
Sign up to set email alerts
|

Parallel K-Medoids clustering algorithm based on Hadoop

Abstract: The K-Medoids clustering algorithm solves the problem of the K-Means algorithm on processing the outlier samples, but it is not be able to process big-data because of the time complexity[1]. MapReduce is a parallel programming model for processing big-data, and has been implemented in Hadoop. In order to break the big-data limits, the parallel KMedoids algorithm HK-Medoids based on Hadoop was proposed. Every submitted job has many iterative MapReduce procedures: In the map phase, each sample was assigned to on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(4 citation statements)
references
References 4 publications
0
4
0
Order By: Relevance
“…We are, to the best of our knowledge, the first team to scale a semi-metric K-medoids algorithm to 1 billion points (with 6 attributes) (see BillionOne dataset appendix F.2.1). PAMAE [37] does run on a dataset of around 4 billion, but the data is restricted to Euclidean space, other distributed semi-metric K-medoids algorithms [19,22,28,40,41] are not run at this scale. To compare to other non-Euclidean algorithms, HPDBSCAN [16] demonstrates runs upto 82 million points with 4 attributes.…”
Section: Results and Comparisionsmentioning
confidence: 99%
“…We are, to the best of our knowledge, the first team to scale a semi-metric K-medoids algorithm to 1 billion points (with 6 attributes) (see BillionOne dataset appendix F.2.1). PAMAE [37] does run on a dataset of around 4 billion, but the data is restricted to Euclidean space, other distributed semi-metric K-medoids algorithms [19,22,28,40,41] are not run at this scale. To compare to other non-Euclidean algorithms, HPDBSCAN [16] demonstrates runs upto 82 million points with 4 attributes.…”
Section: Results and Comparisionsmentioning
confidence: 99%
“…In [14], the optimal search of medoids is performed based on the basic properties of triangular geometry. The speed of k-medoids clustering is improved when the validity of the clustering result is maintained [18]. Parallel k-medoids clustering can also be implemented on Graphics Processing Unit (GPU).…”
Section: Article Infomentioning
confidence: 99%
“…[14][15][16][17], especially k-medoids clustering, that will be the focus of this research. One of the technologies that used to develop parallel k-medoids clustering is Hadoop-MapReduce[18][19][20][21][22]…”
mentioning
confidence: 99%
“…Based on those improvement and according to the image set size grows constantly in the assistant platform, the dynamically built tree of BIRCH is suitable to the constantly grow image size without other clustering progress, then Jiang [7] introduced BIRCH to ancient character with improved k-medoids algorithm, but the threshold is constant. The BIRCH clustering is based on the subcluster, a lot of isolated data points are eliminated, then the clustering speed is improved.…”
Section: Fig1 the Classical Cf Treementioning
confidence: 99%