2012
DOI: 10.1016/j.csda.2011.09.003
|View full text |Cite
|
Sign up to set email alerts
|

Selection of the number of clusters via the bootstrap method

Abstract: Here the problem of selecting the number of clusters in cluster analysis is considered. Recently, the concept of clustering stability, which measures the robustness of any given clustering algorithm, has been utilized in Wang (2010) for selecting the number of clusters through cross validation. In this manuscript, an estimation scheme for clustering instability is developed based on the bootstrap, and then the number of clusters is selected so that the corresponding estimated clustering instability is minimize… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
128
0
1

Year Published

2014
2014
2022
2022

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 134 publications
(129 citation statements)
references
References 24 publications
0
128
0
1
Order By: Relevance
“…For this work, cluster performance is determined by the bootstrap method, which minimize cluster instability. 33 Figure 6a shows the cluster instability calculated by the bootstrap method for the HCA-average linkage method with the JSD metric in the Fe-Co-Ni system. When the number of clusters is equal to six, the cluster analysis is the most stable.…”
Section: Resultsmentioning
confidence: 99%
“…For this work, cluster performance is determined by the bootstrap method, which minimize cluster instability. 33 Figure 6a shows the cluster instability calculated by the bootstrap method for the HCA-average linkage method with the JSD metric in the Fe-Co-Ni system. When the number of clusters is equal to six, the cluster analysis is the most stable.…”
Section: Resultsmentioning
confidence: 99%
“…The nselectboot function is based on the work of Fang and Wang (2012). T authors focus on the concept of stability as robustness to randomness present in the samp Drawing on the work of Wang (2010), they formulate the concept of stability in the follow way: if one draws samples from the population and applies a selected clustering algorith the results of grouping should not be very different.…”
Section: For Each Pairmentioning
confidence: 99%
“…The main reason for using FPC is that it is not necessary to define the number of clusters in advance (before running the clustering algorithm) as in k-means clustering algorithm. FPC computes the number of clusters automatically via bootstrap where several times 2 bootstrap samples are drawn from the data and the number of clusters is chosen by optimizing an instability estimation from these pairs, as explained in [13]. This automatic process is crucial, since we want to analyze several products and not every expert would be able to identify the correct number of clusters without some additional training.…”
Section: Using Clustering To Define Reliable Price Rangementioning
confidence: 99%