2015
DOI: 10.1017/s1351324915000273
|View full text |Cite
|
Sign up to set email alerts
|

Silhouette + attraction: A simple and effective method for text clustering

Abstract: This article presents Sil-Att, a simple and effective method for text clustering, which is based on two main concepts: the silhouette coefficient and the idea of attraction. The combination of both principles allows us to obtain a general technique that can be used either as a boosting method, which improves results of other clustering algorithms, or as an independent clustering algorithm. The experimental work shows that Sil-Att is able to obtain high quality results on text corpora with very different charac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 56 publications
0
5
0
Order By: Relevance
“…By comparing the BCALoD algorithm, K-means clustering, and mixed Gaussian clustering, the relative performance of the BCALoD algorithm was determined. In K-means clustering, the silhouette coefficient 17 is used to calculate the clustering and separation degrees, and the maximum value of the silhouette coefficient is selected as the number of clusters, which completes K-means clustering. In mixed Gaussian clustering, the Bayesian information criterion (BIC) 18 is used to select the number of clusters, and the minimum value of the BIC is selected as the data number of clusters.…”
Section: Resultsmentioning
confidence: 99%
“…By comparing the BCALoD algorithm, K-means clustering, and mixed Gaussian clustering, the relative performance of the BCALoD algorithm was determined. In K-means clustering, the silhouette coefficient 17 is used to calculate the clustering and separation degrees, and the maximum value of the silhouette coefficient is selected as the number of clusters, which completes K-means clustering. In mixed Gaussian clustering, the Bayesian information criterion (BIC) 18 is used to select the number of clusters, and the minimum value of the BIC is selected as the data number of clusters.…”
Section: Resultsmentioning
confidence: 99%
“…By comparing the BCALoD algorithm, K-means clustering, and mixed Gaussian clustering, the relative performance of the BCALoD algorithm was determined. In Kmeans clustering, the silhouette coe cient [16] is used to calculate the clustering and separation degrees, and the maximum value of the silhouette coe cient is selected as the number of clusters, which completes K-means clustering. In mixed Gaussian clustering, the Bayesian information criterion (BIC) [17] is used to select the number of clusters, and the minimum value of the BIC is selected as the data number of clusters.…”
Section: Resultsmentioning
confidence: 99%
“…Two measures are employed to estimate the clustering effect of ICs and PCs: SC and Kmeans. SC is a classic validity measure for clustering problems [23]. The SC combines two key aspects to determine the quality of a cluster-cohesion and separation.…”
Section: Resultsmentioning
confidence: 99%