2020
DOI: 10.1162/tacl_a_00325
|View full text |Cite
|
Sign up to set email alerts
|

Topic Modeling in Embedding Spaces

Abstract: Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the embedded topic model (etm), a generative model of documents that marries traditional topic models with word embeddings. More specifically, the etm models each word with a categorical distribution whose natural parameter is the inner product between the word’s embedding and an embedding of its… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
308
0
3

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 455 publications
(314 citation statements)
references
References 27 publications
3
308
0
3
Order By: Relevance
“…Using one of the most popular embedding algorithms, a generalization of the original word2vec algorithm (Mikolov, Chen, Corrado, & Dean, 2013), termed Glove (Pennington, Socher, & Manning, 2014), our selected target word “feeling,” produced high similarity scores for “i,” “today,” and “good,” when considering the full document. LDA has been recently integrated with embeddings (e.g., Dieng, Ruiz, & Blei, 2019), both in using topics to create embeddings and identifying latent topics directly from embeddings; consequently, the LDA model can better incorporate the context of each word, moving beyond an unordered representation of words.…”
Section: Statistical Algorithmsmentioning
confidence: 99%
“…Using one of the most popular embedding algorithms, a generalization of the original word2vec algorithm (Mikolov, Chen, Corrado, & Dean, 2013), termed Glove (Pennington, Socher, & Manning, 2014), our selected target word “feeling,” produced high similarity scores for “i,” “today,” and “good,” when considering the full document. LDA has been recently integrated with embeddings (e.g., Dieng, Ruiz, & Blei, 2019), both in using topics to create embeddings and identifying latent topics directly from embeddings; consequently, the LDA model can better incorporate the context of each word, moving beyond an unordered representation of words.…”
Section: Statistical Algorithmsmentioning
confidence: 99%
“…Xu et al [30] adopted the Wasserstein distances with a distillation mechanism, to learn topics and word embeddings jointly. Dieng et al [31] utilized the inner product between a word embedding and an embedding of the assigned topic to parameterize a categorical distribution as the word in topic models.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, a lot of work is harnessing topic modeling (Blei et al 2003) along with word vectors to learn better word and sentence representations, e.g., LDA (Chen and Liu 2014), weight-BoC (Kim, Kim, and Cho 2017), TWE , NTSG (Liu, Qiu, and Huang 2015), WTM (Fu et al 2016), w2v-LDA (Nguyen et al 2015, TV+MeanWV (Li et al 2016a), LTSG (Law et al 2017), Gaussian-LDA (Das, Zaheer, and Dyer 2015), Topic2Vec (Niu et al 2015), TM (Dieng, Ruiz, and Blei 2019b), LDA2vec (Moody 2016), D-ETM (Dieng, Ruiz, and Blei 2019a) and MvTM . (Kiros et al 2015) propose skip-thought document embedding vectors which transformed the idea of abstracting the distributional hypothesis from word to sentence level.…”
Section: Related Workmentioning
confidence: 99%