Findings of the Association for Computational Linguistics: EMNLP 2021 2021
DOI: 10.18653/v1/2021.findings-emnlp.206
|View full text |Cite
|
Sign up to set email alerts
|

Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media

Abstract: Language use differs between domains and even within a domain, language use changes over time. For pre-trained language models like BERT, domain adaptation through continued pre-training has been shown to improve performance on in-domain downstream tasks. In this article, we investigate whether temporal adaptation can bring additional benefits. For this purpose, we introduce a corpus of social media comments sampled over three years. It contains unlabelled data for adaptation and evaluation on an upstream mask… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
52
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 34 publications
(54 citation statements)
references
References 42 publications
2
52
0
Order By: Relevance
“…Several recent studies have explored and evaluated the generalization ability of language models to time (Röttger and Pierrehumbert, 2021;Lazaridou et al, 2021;Agarwal and Nenkova, 2021). To better handle continuously evolving web content, Hombaiah et al ( 2021) performed incremental training.…”
Section: Temporal Language Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…Several recent studies have explored and evaluated the generalization ability of language models to time (Röttger and Pierrehumbert, 2021;Lazaridou et al, 2021;Agarwal and Nenkova, 2021). To better handle continuously evolving web content, Hombaiah et al ( 2021) performed incremental training.…”
Section: Temporal Language Modelsmentioning
confidence: 99%
“…They focused on two temporal tasks: semantic change detection and sentence time prediction. Others focused on document classification by using wordlevel temporal embeddings (Huang and Paul, 2019) and adapting pretrained BERT models to domain and time (Röttger and Pierrehumbert, 2021).…”
Section: Temporal Language Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…The global changes captured with language model perplexity and analysis of individual words cannot indicate how these changes impact the performance of a model trained for a given task. Röttger and Pierrehumbert (2021) present a meticulously executed study of how domain change (topic of discussion) influences both language models and a downstream classification task. They show that even big changes in language model perplexity may lead to small changes in downstream task performance.…”
Section: Background and Related Workmentioning
confidence: 99%
“…The only problem is that there is a single test year chosen and any anomaly in that test year may lead to misleading results. An evaluation setup much like the one in Huang and Paul (2018) or the much more recent work in Röttger and Pierrehumbert (2021) is needed to draw robust conclusions. As noted above, we adopt their setup with some changes and additionally provide visualization plots to easily interpret trends.…”
Section: • Does Performance Deteriorate Over Time?mentioning
confidence: 99%