Parallel Clustering Validation Based on MapReduce

Zerabi, Soumeya; Meshoul, Souham; Khantoul, Bilel

doi:10.1007/978-3-319-98352-3_31

Cited by 2 publications

(1 citation statement)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, the fifth category of algorithms is designed to optimize clusters for improving the clustering accuracy. Zerabi et al [21] developed a new clustering method using conditional entropy index. This method involves a process with three tasks with each dealing with MapReduce operations.…”

Section: Related Workmentioning

confidence: 99%

Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach

Bakhthemmat

Izadi

2020

J Big Data

View full text Add to dashboard Cite

The amount of data generated on the internet grows every day at a high rate. This rate of data generation requires rapid processing. The MapReduce technique is applied for distributed computing of huge data, whose main idea is job parallelization. The MapReduce algorithm deals with two important tasks, namely Map and Reduce. Initially, the Map includes a set of data, which is broken down into tuples (key/value pairs). Secondly, reduce task takes the map output as an input whereby Reducers run the tasks. Job clustering can determine an allocation of jobs to the reducers and mappers. In recent years, this method has been used frequently for job allocation in MapReduce for shortening the execution time of big data processing [1].

show abstract

Section: Related Workmentioning

confidence: 99%

Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach

Bakhthemmat

Izadi

2020

J Big Data

View full text Add to dashboard Cite

show abstract

Models for Internal Clustering Validation Indexes Based on Hadoop-MapReduce

Zerabi

Meshoul

Boucherkha

2020

International Journal of Distributed Systems and Technologies

Self Cite

View full text Add to dashboard Cite

Cluster validation aims to both evaluate the results of clustering algorithms and predict the number of clusters. It is usually achieved using several indexes. Traditional internal clustering validation indexes (CVIs) are mainly based in computing pairwise distances which results in a quadratic complexity of the related algorithms. The existing CVIs cannot handle large data sets properly and need to be revisited to take account of the ever-increasing data set volume. Therefore, design of parallel and distributed solutions to implement these indexes is required. To cope with this issue, the authors propose two parallel and distributed models for internal CVIs namely for Silhouette and Dunn indexes using MapReduce framework under Hadoop. The proposed models termed as MR_Silhouette and MR_Dunn have been tested to solve both the issue of evaluating the clustering results and identifying the optimal number of clusters. The results of experimental study are very promising and show that the proposed parallel and distributed models achieve the expected tasks successfully.

show abstract

Parallel Clustering Validation Based on MapReduce

Cited by 2 publications

References 14 publications

Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach

Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach

Models for Internal Clustering Validation Indexes Based on Hadoop-MapReduce

Contact Info

Product

Resources

About