Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-short.96
|View full text |Cite
|
Sign up to set email alerts
|

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Abstract: Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data. However, the resulting word groups are often not coherent, making them harder to interpret. Recently, neural topic models have shown improvements in overall coherence. Concurrently, contextual embeddings have advanced the state of the art of neural models in general. In this paper, we combine contextualized representations with neural topic models. We find that our appro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
76
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 140 publications
(78 citation statements)
references
References 31 publications
0
76
2
Order By: Relevance
“…a variable that can have a non-linear association with the distribution of how prevalent each topic is over time). Notably, we select the structural topic model as opposed to more recent topic modeling frameworks that incorporate more recent innovations in the natural language processing literature, in particular those that incorporate new and more rapid approaches for model optimization (Card, Tan, and Smith 2018) and those that incorporate contextualized word embeddings (Bianchi, Terragni, and Hovy 2021). We opt for the STM over these other approaches for two reasons.…”
Section: Identifying Topics In Local News Coveragementioning
confidence: 99%
“…a variable that can have a non-linear association with the distribution of how prevalent each topic is over time). Notably, we select the structural topic model as opposed to more recent topic modeling frameworks that incorporate more recent innovations in the natural language processing literature, in particular those that incorporate new and more rapid approaches for model optimization (Card, Tan, and Smith 2018) and those that incorporate contextualized word embeddings (Bianchi, Terragni, and Hovy 2021). We opt for the STM over these other approaches for two reasons.…”
Section: Identifying Topics In Local News Coveragementioning
confidence: 99%
“…The third group has become popular in the last years with the rise of deep learning. Neural topic modeling is based on Variational Auto-Encoders (VAE) [4,17,30,33]. Typically, an encoder such as a MultiLayer Perceptron (MLP) compresses the Bag-of-Words (BoW) document representation into a continuous vector.…”
Section: Related Workmentioning
confidence: 99%
“…Typically, an encoder such as a MultiLayer Perceptron (MLP) compresses the Bag-of-Words (BoW) document representation into a continuous vector. Then, a decoder reconstructs the document by generating words independently [4,17]. Negative sampling and Quantization Topic Model (NQTM) [33], the latest topic modeling technique on short texts brings two contributions which yielded the current SOTA results.…”
Section: Related Workmentioning
confidence: 99%
“…Following (Bianchi et al, 2021a), we use the contextualized document representations derived from SentenceBERT (Reimers and Gurevych, 2019). We use the pre-trained BERT model fine-tuned on the natural language inference (NLI) task.…”
Section: Modelmentioning
confidence: 99%