2022
DOI: 10.1007/s00180-022-01246-z
|View full text |Cite
|
Sign up to set email alerts
|

Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data

Abstract: Topic models are a useful and popular method to find latent topics of documents. However, the short and sparse texts in social media micro-blogs such as Twitter are challenging for the most commonly used Latent Dirichlet Allocation (LDA) topic model. We compare the performance of the standard LDA topic model with the Gibbs Sampler Dirichlet Multinomial Model (GSDMM) and the Gamma Poisson Mixture Model (GPM), which are specifically designed for sparse data. To compare the performance of the three models, we pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 26 publications
(8 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…It also has a precise method for balancing the completeness and homogeneity of the clustering results. It can also obtain the representative words of each cluster (Weisser et al , 2022).…”
Section: Methodsmentioning
confidence: 99%
“…It also has a precise method for balancing the completeness and homogeneity of the clustering results. It can also obtain the representative words of each cluster (Weisser et al , 2022).…”
Section: Methodsmentioning
confidence: 99%
“…The researcher [4] explores the challenge of applying topic models to the brief and sparse texts commonly found in social media micro-blogs, such as Twitter. A comparison of three models is carried out, which include the standard Latent Dirichlet Allocation (LDA), the Gibbs Sampler Dirichlet Multinomial Model (GSDMM), and the Gamma Poisson Mixture Model (GPM), the latter two being specifically tailored for sparse data.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Comparing GSDMM and LDA, the GSDMM is better suited to short texts as it assumes that there is one topic in the text [9,10].…”
Section: Brief Justification For Choosing Gsdmm For Text Clustering I...mentioning
confidence: 99%