2010
DOI: 10.1007/s00357-010-9049-5
|View full text |Cite
|
Sign up to set email alerts
|

Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads

Abstract: The issue of determining "the right number of clusters" in K-Means has attracted considerable interest, especially in the recent years. Cluster overlap appears to be a factor most affecting the clustering results. This paper proposes an experimental setting for comparison of different approaches at data generated from Gaussian clusters with the controlled parameters of between-and within-cluster spread to model different cluster overlaps. The setting allows for evaluating the centroid recovery on par with conv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
163
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 274 publications
(163 citation statements)
references
References 45 publications
0
163
0
Order By: Relevance
“…To eliminate both problems repeated clustering is necessary (typically 25 runs are used). The most suitable number of clusters can be determined by various criteria such as elbow (bend) rule, Hartigan index, Gap statistics, average silhouette, Aikake information criterion, etc., see Meloun and Militký (2006) or Chiang and Mirkin (2009).…”
Section: The Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…To eliminate both problems repeated clustering is necessary (typically 25 runs are used). The most suitable number of clusters can be determined by various criteria such as elbow (bend) rule, Hartigan index, Gap statistics, average silhouette, Aikake information criterion, etc., see Meloun and Militký (2006) or Chiang and Mirkin (2009).…”
Section: The Methodsmentioning
confidence: 99%
“…Hartigan (1975), Meloun and Militký (2006) or Chiang and Mirkin (2009). This method enables to divide objects into clusters (groups) where the division is based on objects' similarity or proximity, which is appropriate for the presented study, as k-means clustering will result in a set of (small) groups of countries with similar economic development in examined period.…”
Section: The Methodsmentioning
confidence: 99%
“…Some of which have been target of research effort for a long time. For instance, K-Means requires K (the number of clusters in Y ) to be known beforehand, and the clustering produced by K-Means can be heavily affected by the initial centroids used in its first step [3], [5], [24], [30], [33], [35].…”
Section: Introductionmentioning
confidence: 99%
“…Among the many algorithms addressing these two interrelated issues, intelligent K-Means (iK-Means) seems quite successful [5], [7], [28]. This algorithm finds the clusters in a data set by extracting one anomalous pattern at a time, as per below.…”
Section: Introductionmentioning
confidence: 99%
“…The choice of the range, rather than of the standard deviation in (16) has yet another interesting characteristic: unlike the latter, the former is not biased towards unimodal distributions [21,6]. Consider the following example of two features, a being unimodal and b being bimodal.…”
mentioning
confidence: 99%