2018
DOI: 10.1371/journal.pone.0201874
|View full text |Cite
|
Sign up to set email alerts
|

Balancing effort and benefit of K-means clustering algorithms in Big Data realms

Abstract: In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0
2

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 26 publications
(20 citation statements)
references
References 30 publications
0
18
0
2
Order By: Relevance
“…K-means clustering is a kind of data clustering techniques to divide cases or variables of a dataset into non-overlapping groups/clusters, based on the characteristics uncovered. The goal is to produce groups of cases/variables with a high degree of similarity within each group and a low degree of similarity between groups [2629]. In this method, we only used the FBIS score and classified the sample into high burden and low burden group by K-means clustering to get a cutoff point for the FBIS score.…”
Section: Methodsmentioning
confidence: 99%
“…K-means clustering is a kind of data clustering techniques to divide cases or variables of a dataset into non-overlapping groups/clusters, based on the characteristics uncovered. The goal is to produce groups of cases/variables with a high degree of similarity within each group and a low degree of similarity between groups [2629]. In this method, we only used the FBIS score and classified the sample into high burden and low burden group by K-means clustering to get a cutoff point for the FBIS score.…”
Section: Methodsmentioning
confidence: 99%
“…Median lesion intensities per lesion from the FSPGR, SE, and FLAIR sequences were used, the number of clusters was set to 2; the iterate and classify method was used, and the number of maximum iterations was set to 10. The K-means cluster algorithm creates clusters from the dataset, placing centroids in a way that the data in a given cluster have similar attributes or closeness to the centroid, whilst the distance between clusters (centroids) is maximized ( 23 ). In order to quantify how median intensity values of the two lesion type clusters differ from the normal white matter intensity profile, we employed a bootstrap-based approach using a custom-made MATLAB script.…”
Section: Methodsmentioning
confidence: 99%
“…The Clustering algorithm has the advantage of finding a solution for a large complex vehicle routing and scheduling problem by splitting the problem into sub-problems of smaller clusters to solve, which is relatively easier, and combining the outcomes to form a total solution. It can provide a good balance between effort and quality of solution [ 101 ]. The shortcoming is that it can be challenging in splitting the original problem into an appropriate number of clusters to obtain optimality.…”
Section: Solution Approachesmentioning
confidence: 99%