Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2008
DOI: 10.1145/1401890.1401960
|View full text |Cite
|
Sign up to set email alerts
|

Fast collapsed gibbs sampling for latent dirichlet allocation

Abstract: In this paper we introduce a novel collapsed Gibbs sampling method for the widely used latent Dirichlet allocation (LDA) model. Our new method results in significant speedups on real world text corpora. Conventional Gibbs sampling schemes for LDA require O(K) operations per sample where K is the number of topics in the model. Our proposed method draws equivalent samples but requires on average significantly less then K operations per sample. On real-word corpora FastLDA can be as much as 8 times faster than th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
291
0
6

Year Published

2009
2009
2017
2017

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 447 publications
(299 citation statements)
references
References 11 publications
2
291
0
6
Order By: Relevance
“…Han and Sun propose a more complex hierarchical model and perform inference using incremental Gibbs sampling rather than with pre-constructed topics [5]. Porteous et al speed up LDA Gibbs sampling by bounding on the normalizing constant of the sampling distribution [14]. They report up to 8 times speedup on a few thousand topics.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Han and Sun propose a more complex hierarchical model and perform inference using incremental Gibbs sampling rather than with pre-constructed topics [5]. Porteous et al speed up LDA Gibbs sampling by bounding on the normalizing constant of the sampling distribution [14]. They report up to 8 times speedup on a few thousand topics.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Based on this assumption, quite a few algorithms have been developed, trying to solve the topic vectors given a corpus [4,3,5,6,7]. As we will see in Section 3, by verifying this assumption on real datasets, we found that the tractional model is highly inaccurate.…”
Section: Introductionmentioning
confidence: 91%
“…Then the words of all T kinds are drawn by first sampling a topic indicator k for the word from θ and then drawing the word from the per-type topic word distributions β t,k . Since exact inference is intractable for the model, we use a collapsed Gibbs sampler [19] for approximate inference. θ, the document topic distribution obtained after inference provides an estimate for the predicted cluster membership of an entity document.…”
Section: Entity Clusteringmentioning
confidence: 99%