Fast collapsed gibbs sampling for latent dirichlet allocation

Porteous, Ian R.; Newman, David J.; Ihler, Alexander T.; Asuncion, Arthur; Smyth, Padhraic; Welling, Max

doi:10.1145/1401890.1401960

Cited by 447 publications

(299 citation statements)

References 11 publications

Supporting

Mentioning

291

Contrasting

Unclassified

Order By: Relevance

“…Han and Sun propose a more complex hierarchical model and perform inference using incremental Gibbs sampling rather than with pre-constructed topics [5]. Porteous et al speed up LDA Gibbs sampling by bounding on the normalizing constant of the sampling distribution [14]. They report up to 8 times speedup on a few thousand topics.…”

Section: Background and Related Workmentioning

confidence: 99%

A Scalable Gibbs Sampler for Probabilistic Entity Linking

Houlsby

Ciaramita

2014

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Entity linking involves labeling phrases in text with their referent entities, such as Wikipedia or Freebase entries. This task is challenging due to the large number of possible entities, in the millions, and heavy-tailed mention ambiguity. We formulate the problem in terms of probabilistic inference within a topic model, where each topic is associated with a Wikipedia article. To deal with the large number of topics we propose a novel efficient Gibbs sampling scheme which can also incorporate side information, such as the Wikipedia graph. This conceptually simple probabilistic approach achieves state-of-the-art performance in entity-linking on the Aida-CoNLL dataset.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

A Scalable Gibbs Sampler for Probabilistic Entity Linking

Houlsby

Ciaramita

2014

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Based on this assumption, quite a few algorithms have been developed, trying to solve the topic vectors given a corpus [4,3,5,6,7]. As we will see in Section 3, by verifying this assumption on real datasets, we found that the tractional model is highly inaccurate.…”

Section: Introductionmentioning

confidence: 91%

Joint topic-document modeling via low-dimensional sparse models

Min

2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

show abstract

“…Then the words of all T kinds are drawn by first sampling a topic indicator k for the word from θ and then drawing the word from the per-type topic word distributions β t,k . Since exact inference is intractable for the model, we use a collapsed Gibbs sampler [19] for approximate inference. θ, the document topic distribution obtained after inference provides an estimate for the predicted cluster membership of an entity document.…”

Section: Entity Clusteringmentioning

confidence: 99%

From Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering

Balasubramanyan

Dalvi

Cohen

2013

Advanced Information Systems Engineering

View full text Add to dashboard Cite

Abstract. We present methods to introduce different forms of supervision into mixed-membership latent variable models. Firstly, we introduce a technique to bias the models to exploit topic-indicative features, i.e. features which are apriori known to be good indicators of the latent topics that generated them. Next, we present methods to modify the Gibbs sampler used for approximate inference in such models to permit injection of stronger forms of supervision in the form of labels for features and documents, along with a description of the corresponding change in the underlying generative process. This ability allows us to span the range from unsupervised topic models to semi-supervised learning in the same mixed membership model. Experimental results from an entity-clustering task demonstrate that the biasing technique and the introduction of feature and document labels provide a significant increase in clustering performance over baseline mixed-membership methods.

show abstract

Fast collapsed gibbs sampling for latent dirichlet allocation

Cited by 447 publications

References 11 publications

A Scalable Gibbs Sampler for Probabilistic Entity Linking

A Scalable Gibbs Sampler for Probabilistic Entity Linking

Joint topic-document modeling via low-dimensional sparse models

From Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering

Contact Info

Product

Resources

About