2018
DOI: 10.1080/03610926.2018.1504968
|View full text |Cite
|
Sign up to set email alerts
|

BMS: An improved Dunn index for Document Clustering validation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(8 citation statements)
references
References 14 publications
0
8
0
Order By: Relevance
“…DI [32] was used to validate and identify sets of clusters which were compact, with small variations between cluster members and with sufficient distance between other clusters centroids. The optimal number of clusters K=5 was chosen for this research work with maximum Dunn Index value (0.2797) as shown in Figure 5.…”
Section: And Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…DI [32] was used to validate and identify sets of clusters which were compact, with small variations between cluster members and with sufficient distance between other clusters centroids. The optimal number of clusters K=5 was chosen for this research work with maximum Dunn Index value (0.2797) as shown in Figure 5.…”
Section: And Discussionmentioning
confidence: 99%
“…These two phases are iterated for k=√ 𝑛 2 times, where n is the total number of feature vectors. In each iteration the Dunn index (DI) [32] is calculated using the equation 1 and the details of clusters and the DI values are stored in temporary lists. The total number of clusters and their details are considered for further analysis based on the highest DI value.…”
Section: Enhanced K-means Clusteringmentioning
confidence: 99%
“…We experimented with 8 similarity thresholds from 0.6 to 0.95 with 0.05 increments to cluster distress narratives. Though various cluster quality metrics such as the Silhouette coefficient (Rousseeuw 1987), Dunn index (Misuraca, Spano, and Balbi 2019), and average point-to-centroid cosine distance, were computed for each threshold to select an optimal similarity threshold, manual inspection on a subset of 10 clusters at each threshold and cluster visualization revealed that those metrics do not work best for this dataset (Above metrics are known to work best only for datasets having convex-shaped clusters). Results of manual inspection conveyed that the stressors identified at higher thresholds such as 0.95 and 0.9 are too specific and those below 0.8 are too vague (cluster quality metrics and topics discovered through manual inspection at each threshold are included in the appendices).…”
Section: Identification Of Stressorsmentioning
confidence: 99%
“…The typical validity indices cannot solve the above problems. For example, the improved Dunn index [41] relies on the trail-and-error strategy and specific clustering algorithms. As for SCUBI, the intra-and inter-cluster distances cannot be defined well.…”
Section: Tibshirani's Gap Statistic Index (Gs ↑) [31]mentioning
confidence: 99%