2020
DOI: 10.1007/978-3-030-51310-8_10
|View full text |Cite
|
Sign up to set email alerts
|

Enhancement of Short Text Clustering by Iterative Classification

Abstract: Short text clustering is a challenging task due to the lack of signal contained in short texts. In this work, we propose iterative classification as a method to boost the clustering quality of short texts. The idea is to repeatedly reassign (classify) outliers to clusters until the cluster assignment stabilizes. The classifier used in each iteration is trained using the current set of cluster labels of the non-outliers; the input of the first iteration is the output of an arbitrary clustering algorithm. Thus, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(13 citation statements)
references
References 16 publications
0
13
0
Order By: Relevance
“…For text clustering, we compare the proposed TCL with 11 benchmarks, including TF/TF-IDF (Jones, 1972), BagOfWords (BOW) (Harris, 1954), SkipVec (Kiros et al, 2015), Para2Vec (Le and Mikolov, 2014), GSDPMM (Yin and Wang, 2016), RecNN (Socher et al, 2011), STCC (Xu et al, 2017b), HAC-SD (Rakib et al, 2020), ECIC (Rakib et al, 2020), and SCCL (Zhang et al, 2021a). Similarly, the vanilla k-means is conducted on the extracted features to cluster data for those representation-based methods, including BOW, TF/TF-IDF, SkipVec, Para2Vec, and RecNN.…”
Section: Compared Methodsmentioning
confidence: 99%
“…For text clustering, we compare the proposed TCL with 11 benchmarks, including TF/TF-IDF (Jones, 1972), BagOfWords (BOW) (Harris, 1954), SkipVec (Kiros et al, 2015), Para2Vec (Le and Mikolov, 2014), GSDPMM (Yin and Wang, 2016), RecNN (Socher et al, 2011), STCC (Xu et al, 2017b), HAC-SD (Rakib et al, 2020), ECIC (Rakib et al, 2020), and SCCL (Zhang et al, 2021a). Similarly, the vanilla k-means is conducted on the extracted features to cluster data for those representation-based methods, including BOW, TF/TF-IDF, SkipVec, Para2Vec, and RecNN.…”
Section: Compared Methodsmentioning
confidence: 99%
“…Another interesting technique concerning intersections of classification and clustering of short texts is presented in [ 34 ] where a classifier is trained with cluster labels to improve the previous clustering.…”
Section: Previous Workmentioning
confidence: 99%
“…Rajan et al [44] depict a clustering process to aggregate patent descriptions into similar groups to facilitate the search process in patent databases. Rakib et al [45] propose an iterative classification method that improves the clustering of short texts. This is done by detecting outliers during the clustering process and changing the clusters to which they are assigned.…”
Section: Related Workmentioning
confidence: 99%