2012
DOI: 10.3233/ida-2012-0545
|View full text |Cite
|
Sign up to set email alerts
|

A comparison of clustering quality indices using outliers and noise

Abstract: Quality indices in clustering are used not only to assess the quality of the partitions but also to determine the number of clusters in the final result. When these indices are evaluated in a case study, real data conditions or different clustering algorithms are seldom taken into account. Here, some of the standard indices used in the literature are compared using more realistic databases that include outliers or noisy dimensions, which is more like a real problem-solving approach. Besides, three different cl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
24
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 47 publications
(25 citation statements)
references
References 25 publications
1
24
0
Order By: Relevance
“…However, the process of cluster validation is a complex process [35]. Though various CVIs have been proposed [27,28,[36][37][38], no CVI performs better than all the others. Similarly, not one CVI is suitable and can perform effectively on all kinds of data sets [38][39][40][41][42][43][44][45].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the process of cluster validation is a complex process [35]. Though various CVIs have been proposed [27,28,[36][37][38], no CVI performs better than all the others. Similarly, not one CVI is suitable and can perform effectively on all kinds of data sets [38][39][40][41][42][43][44][45].…”
Section: Resultsmentioning
confidence: 99%
“…The number of clusters is a basic and necessary input parameter of FCM algorithm, which is generally obtained by various CVIs or prior information from domain knowledge. For a data set without any prior information, the first step to use our method is to determine the optimal number of clusters using the selected most effective CVI from various CVIs [27,28,[36][37][38]. We also note that other factors such as the basic principles of FCM algorithm, the performance of CVIs, and the characteristics of data sets will also affect the clustering process, and thus affect the selection of m. In our future work, we will focus on these factors.…”
Section: Resultsmentioning
confidence: 99%
“…Supervised and semisupervised classification algorithms can be implemented if respectively all or part of the data are associated to class labels; in these cases a fusion of multiple classifier can lead to improved classification performance [41]. The issue of identification of an appropriate number of clusters, among the number of possible partitionings, is addressed computing validity indexes [42] as the Davies-Bouldin index [43], which compares S c within-cluster distance and d ce between clusters distance. Other effective validity indexes are the Calinski-Harabasz index and the Gamma index [44].…”
Section: Self-organizing Maps and Clustering Analysismentioning
confidence: 99%
“…Guerra, Robles, Bielza, and Larrañaga () also added noise or outliers to artificial data. They evaluated five internal criteria, Silhouette, CH, C‐Index, DB, Gamma, using artificial data with different yet relatively low dimensionality, number of clusters, outliers, and noise.…”
Section: Configuration/parameterizationmentioning
confidence: 99%