2006
DOI: 10.1348/000711005x48266
|View full text |Cite
|
Sign up to set email alerts
|

K‐means clustering: A half‐century synthesis

Abstract: This paper synthesizes the results, methodology, and research conducted concerning the K-means clustering method over the last fifty years. The K-means method is first introduced, various formulations of the minimum variance loss function and alternative loss functions within the same class are outlined, and different methods of choosing the number of clusters and initialization, variable preprocessing, and data reduction schemes are discussed. Theoretic statistical results are provided and various extensions … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
593
0
22

Year Published

2010
2010
2022
2022

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 854 publications
(615 citation statements)
references
References 183 publications
(267 reference statements)
0
593
0
22
Order By: Relevance
“…To deal with 17 missing values (6.2%) of these difference scores, multiple imputation was applied (Basagaña, Barrera-Gómez, Benet, Antó, & Garcia-Aymerich, 2013) using the R software package developed by Buuren and Groothuis-Oudshoorn (MICE; 2011) to obtain 100 imputed data sets. On these datasets, k-Means cluster analysis (Steinley, 2006) was carried out using the R software package developed by Barrera-Gómez and Basagaña (MICLUST; 2013) to identify clusters of similar patterns of PSS-SR scores. In k-Means clustering an iterative process reallocates participants to a certain number of clusters in order to minimize the within-cluster variance.…”
Section: Methodsmentioning
confidence: 99%
“…To deal with 17 missing values (6.2%) of these difference scores, multiple imputation was applied (Basagaña, Barrera-Gómez, Benet, Antó, & Garcia-Aymerich, 2013) using the R software package developed by Buuren and Groothuis-Oudshoorn (MICE; 2011) to obtain 100 imputed data sets. On these datasets, k-Means cluster analysis (Steinley, 2006) was carried out using the R software package developed by Barrera-Gómez and Basagaña (MICLUST; 2013) to identify clusters of similar patterns of PSS-SR scores. In k-Means clustering an iterative process reallocates participants to a certain number of clusters in order to minimize the within-cluster variance.…”
Section: Methodsmentioning
confidence: 99%
“…Hundreds of classification and partition algorithms can be used to stratify heterogeneity (Lu and Carlin, 2004;Jain, 2009;Jiao et al, 2011). Examples include Kmeans grouping (Steinhaus, 1957;MacQueen, 1967;Steinley, 2006) and regression trees (Breiman et al, 1984), which are implemented in extensively used software packages, ARCGIS (©Esri Inc.) and R/SPODT. The effectiveness of these algorithms is measured by the Calinski-Harabasz pseudo F-statistic (Calinski and Harabasz, 1974), which is a ratio reflecting the within-group similarity and betweengroup differences, and the Gini/Information Gain/Chi-square test, respectively.…”
Section: Introductionmentioning
confidence: 99%
“…W literaturze przedmiotu można znaleźć wiele ilościowych wskaźników jakości grupowania, stworzonych po to, aby ułatwić wybór najlepszej liczby grup (np. [Milligan, Cooper 1985;Rousseeuw 1987;Migdał-Najman, Najman 2005;Steinley 2006]). W tej pracy użyto indeksu Bakera i Huberta, indeksu Huberta i Levine'a oraz indeksu Silhouette Rousseeuw.…”
Section: (X Y) -Najdłuższa Wspólna Podsekwencja (Lcs) Najdłuższy Wsunclassified