2016
DOI: 10.1177/0165551516638784
|View full text |Cite
|
Sign up to set email alerts
|

An improved ant algorithm with LDA-based representation for text document clustering

Abstract: Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
26
0
2

Year Published

2018
2018
2021
2021

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 63 publications
(28 citation statements)
references
References 69 publications
0
26
0
2
Order By: Relevance
“…This method finds an optimal number of topics to cluster documents. Onan et al [6] proposed a K-means clustering using LDA. Tagarelli et al [7] proposed a method to cluster text segments from multi-topic documents.…”
Section: Introductionmentioning
confidence: 99%
“…This method finds an optimal number of topics to cluster documents. Onan et al [6] proposed a K-means clustering using LDA. Tagarelli et al [7] proposed a method to cluster text segments from multi-topic documents.…”
Section: Introductionmentioning
confidence: 99%
“…Qiu and Xu [17] presented a clustering method, where the LDA was used to extract topics from the texts and the centroids of the K-means algorithm were selected among the nouns with the highest probability values. More recently, Onan et al [18] proposed an improved ant clustering algorithm, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. The latent Dirichlet allocation (LDA) was used to represent textual documents.…”
Section: Topic Modeling In Document Clusteringmentioning
confidence: 99%
“…In this scheme, several different diversity measures (such as Q -statistics, correlation coefficient, Kappa statistics, and double fault measure) are combined via a genetic algorithm. Similarly, Onan et al [ 19 , 20 ] introduced a hybrid ensemble pruning algorithm based on consensus clustering and multiobjective evolutionary algorithm. In this scheme, classifiers are assigned into clusters based on their predictive performance and the set of candidate classifiers are explored through the use of evolutionary algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…In the presented scheme, swarm-optimized approach is employed to estimate the parameters of LDA, including the number of topics and all the other parameters involved in LDA. Motivated by the success of hybrid ensemble pruning schemes [ 19 21 ], the proposed approach combines diversity measures and clustering. In this scheme, four different diversity measures (namely, disagreement measure, Q- statistics, the correlation coefficient, and the double fault measure) are computed to capture the diversities within the ensemble.…”
Section: Introductionmentioning
confidence: 99%