Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications 2022
DOI: 10.1145/3575882.3575905
|View full text |Cite
|
Sign up to set email alerts
|

Performance Comparison of Topic Modeling Algorithms on Indonesian Short Texts

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 11 publications
1
3
0
Order By: Relevance
“…This strategic filtering addresses LDA's shortcomings in handling brief texts and aligns with its general efficiency when dealing with more extensive textual inputs. This finding aligns with previous research that has highlighted LDA's challenges in modeling concise texts when compared to other advanced topic modeling methods [15]. Departing from prior research that explored a broader range of topic counts, ranging from 40 to 60, with various kernel modifications for SVM [7][8], our study adopts a more focused approach to topic allocation.…”
Section: Discussion and Comparative Analysissupporting
confidence: 81%
See 1 more Smart Citation
“…This strategic filtering addresses LDA's shortcomings in handling brief texts and aligns with its general efficiency when dealing with more extensive textual inputs. This finding aligns with previous research that has highlighted LDA's challenges in modeling concise texts when compared to other advanced topic modeling methods [15]. Departing from prior research that explored a broader range of topic counts, ranging from 40 to 60, with various kernel modifications for SVM [7][8], our study adopts a more focused approach to topic allocation.…”
Section: Discussion and Comparative Analysissupporting
confidence: 81%
“…LDA represents each document as a mixture of latent topics and as a distribution of words from the vocabulary. The steps and parameters that we use in feature extraction using LDA, following our prior study [15], involve a defining the vocabulary of words V, the number of topics K, and document D, where V = {v1, v2, v3, ..., vn}, K = {2, ..., 50}, D = {d1, d2, d3, ..., dm}. We systematically explore the number of topics K to find the most optimum coherence value.…”
Section: Preprocessing Datamentioning
confidence: 99%
“…This likelihood serves for model comparison, where a higher likelihood indicates a superior model. The perplexity of held-out texts often serves as a measure for topic models and is defined in (7).…”
Section: Perplexitymentioning
confidence: 99%
“…However, concerning the Indonesian language, additional investigation is essential to refine techniques for generating more accurate and important topic models [6]. Hidayati and Parlina [7] on comparison of topic modeling algorithm performance on Indonesian short texts which compares the topic extraction performance of LDA, non-negative matrix factorization (NMF), and gibbs sampling dirichlet multinomial mixture (GSDMM) algorithms from short Indonesian texts. This study found that LDA outperformed NMF and GSDMM regarding topic coherence scores, but human judgment showed that the word clusters generated by NMF and GSDMM were easier to infer.…”
Section: Introductionmentioning
confidence: 99%