2016
DOI: 10.11591/ijece.v6i6.11207
|View full text |Cite
|
Sign up to set email alerts
|

Issues of K Means Clustering While Migrating to Map Reduce Paradigm with Big Data: A Survey

Abstract: <p><span>In recent times Big Data Analysis are imminent as essential area in the field of Computer Science. Taking out of significant information from Big Data by separating the data in to distinct group is crucial task and it is beyond the scope of commonly used personal machine. It is necessary to adopt the distributed environment similar to map reduce paradigm and migrate the data mining algorithm using it. In Data Mining the partition based K Means Clustering is one of the broadly used algorith… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 9 publications
0
4
0
Order By: Relevance
“…Variable selection using feature selection. The result of selecting variables using the toolsjupyter notebook in Figure 4.1 The optimal number of k in this study used the Elbow method, because k-Means has a weakness in determining the number of initial clusters determined randomly [8]. The best number of k for clusters 1 to 10 using the Elbow Method is k=2.…”
Section: Data Preprocessingmentioning
confidence: 99%
See 1 more Smart Citation
“…Variable selection using feature selection. The result of selecting variables using the toolsjupyter notebook in Figure 4.1 The optimal number of k in this study used the Elbow method, because k-Means has a weakness in determining the number of initial clusters determined randomly [8]. The best number of k for clusters 1 to 10 using the Elbow Method is k=2.…”
Section: Data Preprocessingmentioning
confidence: 99%
“…Fuzzy C-Means algorithm has a faster and easier process time to interpret [6], n however, it has weaknesses in the calculation process and fuzzy iterations that use longer time than the K-Means algorithm [7]. The K-Means algorithm is widely applied to research because it is more efficient in categorizing data with very large amounts, but this algorithm is not quite right in random selection of centroid starting points and determining the initial number of clusters [8].…”
Section: Introductionmentioning
confidence: 99%
“…Mapreduce [12] can process data in parallel by the use of map and reduce phase. Kmeans is deployed on Mapreduce with parallel calculation of clusters for processing large scale of data [4], [14], [15]. Similarity between data objects and clusters are different for every object.…”
Section: Proposed Techniquementioning
confidence: 99%
“…The methods of data mining, among others clustering methods, classification methods, etc., are needed to extract or mine the knowledge from large amounts of data. To group the data in accordance with their multiple-characteristic based similarities is known as clustering [1].…”
Section: Introductionmentioning
confidence: 99%