Findings of the Association for Computational Linguistics: EMNLP 2020 2020
DOI: 10.18653/v1/2020.findings-emnlp.338
|View full text |Cite
|
Sign up to set email alerts
|

Context Analysis for Pre-trained Masked Language Models

Abstract: Pre-trained language models that learn contextualized word representations from a large unannotated corpus have become a standard component for many state-of-the-art NLP systems. Despite their successful applications in various downstream NLP tasks, the extent of contextual impact on the word representation has not been explored. In this paper, we present a detailed analysis of contextual impact in Transformer-and BiLSTM-based masked language models. We follow two different approaches to evaluate the impact of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(10 citation statements)
references
References 30 publications
2
8
0
Order By: Relevance
“…Sun et al (2021) reveal that Longformer and Routing transformers can only reduce the perplexity of LMs on a small set of tokens. More related to our work, Lai et al (2020) show that BERT can make use of a larger scope of context than a BiLSTM.…”
Section: Benchmarks and Analysissupporting
confidence: 53%
“…Sun et al (2021) reveal that Longformer and Routing transformers can only reduce the perplexity of LMs on a small set of tokens. More related to our work, Lai et al (2020) show that BERT can make use of a larger scope of context than a BiLSTM.…”
Section: Benchmarks and Analysissupporting
confidence: 53%
“…show that Compressive Transformer improves the performance of infrequent tokens. Our work also relates to that of Lai et al (2020), who investigate the impact of context for pretrained masked LMs. More recently, Press et al (2020) also observe negligible benefits of long-term context; we step further in this direction by exploring larger models with more fine-grained analysis.…”
Section: Related Workmentioning
confidence: 96%
“…The intuition is, for example, if a pretrained encoder has learned to discard the input information, we cannot expect the encoder to perform well when transferred to any tasks. Also, existing studies show that neural language models assign more importance to local context when they make predictions (Khandelwal et al, 2018;Lai et al, 2020). Can we observe that encoders pretrained with artificial languages exhibit similar patterns to natural languages regarding how they encode the contextual information?…”
Section: Resultsmentioning
confidence: 74%