2009
DOI: 10.1109/tsmcb.2008.2004559
|View full text |Cite
|
Sign up to set email alerts
|

K-Means Clustering Versus Validation Measures: A Data-Distribution Perspective

Abstract: K-means is a well-known and widely used partitional clustering method. While there are considerable research efforts to characterize the key features of the K-means clustering algorithm, further investigation is needed to understand how data distributions can have impact on the performance of K-means clustering. To that end, in this paper, we provide a formal and organized study of the effect of skewed data distributions on K-means clustering. Along this line, we first formally illustrate that K-means tends to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
89
0
4

Year Published

2012
2012
2022
2022

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 230 publications
(94 citation statements)
references
References 25 publications
1
89
0
4
Order By: Relevance
“…The results proved that the algorithm is reliable. Xiong et al [7] studied K-Means in data -distribution perspective. They described many validation measures such as CV, purity, entropy and F-measure.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The results proved that the algorithm is reliable. Xiong et al [7] studied K-Means in data -distribution perspective. They described many validation measures such as CV, purity, entropy and F-measure.…”
Section: Related Workmentioning
confidence: 99%
“…We made an empirical study besides review of literature to prove that the Fuzzy K-Means exhibits better clustering performance than K-Means. The literature on these two and their comparison besides other derivatives of them [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23] and [24] can be found in section IV.…”
Section: Introductionmentioning
confidence: 99%
“…Além disso, foram gerados 100 elementos para cada combinação de EAs. Após a geração dos clusters, a qualidade de cada um deles foi avaliada utilizando métricas de validação como Pureza(P), que valida se existem diferentes elementos em cada cluster, e F-Measure (F), que avalia a qualidade de cada cluster com base nos cálculos de precision e recall [Xiong et al 2009]. A Tabela 2 apresenta os resultados obtidos em cada métrica para cada um dos algoritmos testados.…”
Section: Abordagem Propostaunclassified
“…Entropy is a commonly used information theoretic external validation measure that measures the purity of the clusters with respect to given external class labels (Xiong et al, 2006). A perfect clustering has an entropy close to 0 which means that every cluster consists of points with only one class label.…”
Section: Clustering Evaluationmentioning
confidence: 99%