2014
DOI: 10.1109/tetc.2014.2330519
|View full text |Cite
|
Sign up to set email alerts
|

A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis

Abstract: Clustering algorithms have emerged as an alternative powerful meta-learning tool to accurately analyze the massive volume of data generated by modern applications. In particular, their main goal is to categorize data into clusters such that objects are grouped in the same cluster when they are similar according to specific metrics. There is a vast body of knowledge in the area of clustering and there has been attempts to analyze and categorize them for a larger number of applications. However, one of the major… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

1
437
1
16

Year Published

2014
2014
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 912 publications
(455 citation statements)
references
References 29 publications
1
437
1
16
Order By: Relevance
“…In density-based clustering, clusters are denoted with the density of the regions. The cluster grows in any direction leading by its density and each cluster has to contain at least minimum number of data in a neighborhood of a given radius [3]. In grid based algorithms, initially clustering space is divided into finite number of cells.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In density-based clustering, clusters are denoted with the density of the regions. The cluster grows in any direction leading by its density and each cluster has to contain at least minimum number of data in a neighborhood of a given radius [3]. In grid based algorithms, initially clustering space is divided into finite number of cells.…”
Section: Introductionmentioning
confidence: 99%
“…Then, clustering is performed on the grid [6]. Model based approaches begin with random parameter sets and they adjust these parameter sets over the iterations to find maximum likelihood estimator [3]. Similar to other optimization problems, clustering is also a kind of NP-hard problem and one of the biggest problem encountered in clustering approaches is to estimate the number of clusters.…”
Section: Introductionmentioning
confidence: 99%
“…However, such techniques cannot correct mistaken decisions that once have taken. There are two approaches that can help in improving the quality of hierarchical clustering: (1) Firstly to perform carefu l analysis of object lin kages at each hierarchica l partit ioning or (2) By integrating hierarchical agglo merat ion and other approaches by first using a hierarchical agglo merative algorith m to group objects into micro-clusters, and then performing macro-clustering on the micro-clusters using another clustering method such as iterative relocation. One pro mising direct ion for improving the clustering quality of h ierarchical methods is to integrate hierarchical clustering with other clustering techniques for mu ltiple phase clustering.…”
Section: Introductionmentioning
confidence: 99%
“…One pro mising direct ion for improving the clustering quality of h ierarchical methods is to integrate hierarchical clustering with other clustering techniques for mu ltiple phase clustering. So in order to make improvement in hierarchical clustering we merge some other techniques or method in to it [1]. Every day, 2.5 quintillion bytes of data are created and 90 percent o f the data in the world today were produced within the past two years.…”
Section: Introductionmentioning
confidence: 99%
“…Partitioning-Based Clustering Algorithms [9] In partitioning-based algorithms, the data is distributed into various data subsets. The reason behind this splitting is lack of feasibility to check every possible subset; there are certain greedy probing schemes which are used in form of iterative inflation.…”
mentioning
confidence: 99%