The Proceedings of the 2nd International Conference on Industrial Application Engineering 2015 2015
DOI: 10.12792/iciae2015.012
|View full text |Cite
|
Sign up to set email alerts
|

The Clustering Validity with Silhouette and Sum of Squared Errors

Abstract: The data clustering with automatic program such as k-means has been a popular technique widely used in many general applications. Two interesting sub-activity of clustering process are studied in this paper, selection the number of clusters and analysis the result of data clustering. This research aims at studying the clustering validation to find appropriate number of clusters for k-means method. The characteristics of experimental data have 3 shapes and each shape have 4 datasets (100 items), which diffusion… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
59
0
3

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 116 publications
(62 citation statements)
references
References 16 publications
0
59
0
3
Order By: Relevance
“…Next, the segments were characterized using a two-step clustering approach (Rundle-Thiele, Kubacki, Tkaczynski, & Parkinson, 2015). The higher the value >0, the more robust the cluster configuration (Thinsungnoena, Kaoungkub, Durongdumronchaib, Kerdprasopb, & Kerdprasopb, 2015). Using the Bayesian information criterion (BIC), the best number of clusters was identified.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Next, the segments were characterized using a two-step clustering approach (Rundle-Thiele, Kubacki, Tkaczynski, & Parkinson, 2015). The higher the value >0, the more robust the cluster configuration (Thinsungnoena, Kaoungkub, Durongdumronchaib, Kerdprasopb, & Kerdprasopb, 2015). Using the Bayesian information criterion (BIC), the best number of clusters was identified.…”
Section: Discussionmentioning
confidence: 99%
“…First, the silhouette measure of cohesion and separation, a measure of how close each point in a cluster is to the points in its neighboring clusters (from −1 to +1), was calculated. The higher the value >0, the more robust the cluster configuration (Thinsungnoena, Kaoungkub, Durongdumronchaib, Kerdprasopb, & Kerdprasopb, 2015). Next, a test of significance was performed on each construct to identify the differences (if any) amongst the clusters.…”
Section: Discussionmentioning
confidence: 99%
“…Along with this, describe a new method for the silhouette to minimize the computation time with reducing addition operations amount during distance calculation, which has experimentally proven that about 50% CPU time gained. It is also a measure that helps in concluding clustering legitimacy and selecting the optimal K value to divide a ratio scale data into distinct classes [38]. For true K the preferable number of clusters whose silhouette value predicted large enough.…”
Section: B the Silhouette Methods Towards K Findingmentioning
confidence: 99%
“…We determine the optimal clustering number using the Silhouette Index/Average Silhouette and Gap Statistic methods to remove this uncertainty. The first technique calculates the average silhouette value of the instances for multiple k values [23]. Optimal number of clusters maximizes the average silhouette score [23].…”
Section: Optimal Number Of Clustersmentioning
confidence: 99%
“…The first technique calculates the average silhouette value of the instances for multiple k values [23]. Optimal number of clusters maximizes the average silhouette score [23]. We may note that in Scikit-Learn toolkit, the default range of k is 1 to 10.…”
Section: Optimal Number Of Clustersmentioning
confidence: 99%