A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis

Fahad, Adil; Alshatri, Najlaa; Tari, Zahir; Alamri, Abdullah; Khalil, Ibrahim; Zomaya, Albert Y.; Foufou, Sebti; Bouras, Abdelaziz

doi:10.1109/tetc.2014.2330519

Cited by 912 publications

(455 citation statements)

References 29 publications

Supporting

Mentioning

437

Contrasting

Unclassified

Order By: Relevance

“…In density-based clustering, clusters are denoted with the density of the regions. The cluster grows in any direction leading by its density and each cluster has to contain at least minimum number of data in a neighborhood of a given radius [3]. In grid based algorithms, initially clustering space is divided into finite number of cells.…”

Section: Introductionmentioning

confidence: 99%

“…Then, clustering is performed on the grid [6]. Model based approaches begin with random parameter sets and they adjust these parameter sets over the iterations to find maximum likelihood estimator [3]. Similar to other optimization problems, clustering is also a kind of NP-hard problem and one of the biggest problem encountered in clustering approaches is to estimate the number of clusters.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

New Pattern Reduction Method for Generalized Regression Neural Network

Kartal

Oral

2017

IJARCSSE

View full text Add to dashboard Cite

Abstract-Generalized Regression Neural Network (GRNN) is a radial basis function based neural network used for function approximation and prediction. Thanks to its easy modelling structure, and one pass learning, it has been utilized in many applications as an alternative to other prediction methods such as multilayer perceptron (MLP) and support vector machines (SVM). Since the number of neurons at GRNN's pattern layer is proportional to the number of training samples in dataset, increase in memory usage and decrease in computational time will emerge for huge datasets. Therefore, k-nearest neighbour (kNN) and clustering methods such as k-means and hierarchical clustering, etc. have been frequently used for pattern layer size reduction. Pattern layer size reduction may provide not only simplification in structure but also increase in prediction accuracy. In this work, a pattern layer size reduction approach utilizing Angle Based Nearest Neighbor (ABNN) algorithm is proposed for three-dimensional datasets. The proposed method divides training space into specific angles and for each test datum, it searches the nearest training datum within each angle. At the end, there exists a few training data that will be used in GRNN's pattern layer and these training data are similar to the test datum. Performance of the proposed method was evaluated by using fifteen benchmark global optimization test functions and compared with that of standard GRNN and a hybrid method using kNN as a pre-processor. Simulation results show that the proposed method provides 99.33% reduction in pattern layer size and accuracy is also increased maximally to 65.61%.Keywords-Generalized regression neural network, prediction neural networks, nearest neighbor, pattern reduction and reduced dataset.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

New Pattern Reduction Method for Generalized Regression Neural Network

Kartal

Oral

2017

IJARCSSE

View full text Add to dashboard Cite

show abstract

“…However, such techniques cannot correct mistaken decisions that once have taken. There are two approaches that can help in improving the quality of hierarchical clustering: (1) Firstly to perform carefu l analysis of object lin kages at each hierarchica l partit ioning or (2) By integrating hierarchical agglo merat ion and other approaches by first using a hierarchical agglo merative algorith m to group objects into micro-clusters, and then performing macro-clustering on the micro-clusters using another clustering method such as iterative relocation. One pro mising direct ion for improving the clustering quality of h ierarchical methods is to integrate hierarchical clustering with other clustering techniques for mu ltiple phase clustering.…”

Section: Introductionmentioning

confidence: 99%

“…One pro mising direct ion for improving the clustering quality of h ierarchical methods is to integrate hierarchical clustering with other clustering techniques for mu ltiple phase clustering. So in order to make improvement in hierarchical clustering we merge some other techniques or method in to it [1]. Every day, 2.5 quintillion bytes of data are created and 90 percent o f the data in the world today were produced within the past two years.…”

Section: Introductionmentioning

confidence: 99%

Analysis of Hierarchical Clustering Algorithm to Handle Large Dataset

2014

IJAERD

View full text Add to dashboard Cite

show abstract

“…Partitioning-Based Clustering Algorithms [9] In partitioning-based algorithms, the data is distributed into various data subsets. The reason behind this splitting is lack of feasibility to check every possible subset; there are certain greedy probing schemes which are used in form of iterative inflation.…”

mentioning

confidence: 99%