2020
DOI: 10.1007/978-981-33-4673-4_27
|View full text |Cite
|
Sign up to set email alerts
|

Normalized Approach to Find Optimal Number of Topics in Latent Dirichlet Allocation (LDA)

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
4

Relationship

1
9

Authors

Journals

citations
Cited by 49 publications
(16 citation statements)
references
References 15 publications
0
9
0
Order By: Relevance
“…One area of contention in LDA is how many topics to use from a corpus (Zhao et al, 2015). Hasan et al (2021), in their research to examine the methods to find the optimal number of topics in LDA, established that a combination of perplexity score and coherence score works better than existing methods to determine the optimal number of topics. After reaching the optimal number of topics, the topics' relevance and meaningfulness must be assured.…”
Section: Topic Modellingmentioning
confidence: 99%
“…One area of contention in LDA is how many topics to use from a corpus (Zhao et al, 2015). Hasan et al (2021), in their research to examine the methods to find the optimal number of topics in LDA, established that a combination of perplexity score and coherence score works better than existing methods to determine the optimal number of topics. After reaching the optimal number of topics, the topics' relevance and meaningfulness must be assured.…”
Section: Topic Modellingmentioning
confidence: 99%
“…LDA topic analysis requires input parameters to increase their performance, for example, the number of topics ( k ) and a prior Dirichlet topic distribution in the function. However, finding the best number of topics is challenging ( 43 ), especially if there is no prior knowledge about the data. The LDA topic model has a problem in that it does not give the optimal number of topics for the text itself, the exact number of topics being determined by the model user in other ways ( 44 ).…”
Section: Methodsmentioning
confidence: 99%
“…Another possible approach would have consisted in performing topic modeling, for instance by ways of a Latent Dirichlet Allocation on the word frequency matrix, to then infer a distribution of topics for every county. It is, however, more computationally intensive, and poses questions about the selection of the number of topics, their interpretability, and their internal coherence (Arun et al, 2010;Hasan et al, 2021). In a case like ours where documents are so large (aggregating all tweets in a county), it is far from obvious to select a number of topics such that there is little overlap between them and to know that these topics are actually representative of the dataset as a whole.…”
Section: Datasetmentioning
confidence: 99%