Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.119
|View full text |Cite
|
Sign up to set email alerts
|

Sorting through the noise: Testing robustness of information processing in pre-trained language models

Abstract: Pre-trained LMs have shown impressive performance on downstream NLP tasks, but we have yet to establish a clear understanding of their sophistication when it comes to processing, retaining, and applying information presented in their input. In this paper we tackle a component of this question by examining robustness of models' ability to deploy relevant context information in the face of distracting content. We present models with cloze tasks requiring use of critical context information, and introduce distrac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 29 publications
0
12
0
Order By: Relevance
“…Through careful analyses of their output, we can assess just how much can be learned from the statistical regularities of the linguistic environment (Futrell et al., 2019; Wilcox, Futrell, & Levy, 2022). Some of this work has already been done in the context of encoder‐only masked language models, such as BERT and its related descendants (Ettinger, 2020; Pandia & Ettinger, 2021; Rogers, Kovaleva, & Rumshisky, 2020). Their failures, such as with semantic coherence or pragmatics (Arehalli, Dillon, & Linzen, 2022; Dou, Forbes, Koncel‐Kedziorski, Smith, & Choi, 2022; McClelland et al., 2020), are also interesting and point to other central tenets of usage‐based theories such as the role of environmental contexts, developmental histories, cognitive machinery, and functional pressures in human language learning and use (Christiansen & Chater, 2022).…”
mentioning
confidence: 99%
“…Through careful analyses of their output, we can assess just how much can be learned from the statistical regularities of the linguistic environment (Futrell et al., 2019; Wilcox, Futrell, & Levy, 2022). Some of this work has already been done in the context of encoder‐only masked language models, such as BERT and its related descendants (Ettinger, 2020; Pandia & Ettinger, 2021; Rogers, Kovaleva, & Rumshisky, 2020). Their failures, such as with semantic coherence or pragmatics (Arehalli, Dillon, & Linzen, 2022; Dou, Forbes, Koncel‐Kedziorski, Smith, & Choi, 2022; McClelland et al., 2020), are also interesting and point to other central tenets of usage‐based theories such as the role of environmental contexts, developmental histories, cognitive machinery, and functional pressures in human language learning and use (Christiansen & Chater, 2022).…”
mentioning
confidence: 99%
“…Through our experiments, we found this criterion to be met in a majority of the cases, suggesting a strong capacity of models to demonstrate property inheritance. However, post-hoc analyses revealed that for most models, this capacity drastically decreases in the presence of distracting information (sometimes even worse than random-guessing), suggesting a clear lack of robustness in the information processing capacities of PLMs, similar to the results of Pandia and Ettinger (2021). In contrast to their results, we find that larger models are generally more distracted than are smaller models, and this especially happens when the distracting information is closer to the predicted property-phrases, suggesting the presence of a proximity effect.…”
Section: General Discussion and Conclusionmentioning
confidence: 58%
“…To what extent does the compatibility of models with H1 hold in presence of distracting information? This question is inspired by Pandia and Ettinger (2021), who report a substantial decrease in perceived information processing capacity of PLMs in the presence of semantic distractors. Here, we transform the stimuli of COMPS-WUGS by creating two different subordinates for every minimal pair: one for the positive concept (e.g.…”
Section: Post-hoc Robustness Evaluation With Distracting Informationmentioning
confidence: 99%
“…This again suggests some isomorphism between human language processing and DL-based models. The next word prediction objective also enables language models to perform well on psycholinguistic diagnostics like the cloze task, although there is substantial room for improvement ( Ettinger, 2020 ; Pandia & Ettinger, 2021 ). Finally, self-supervised ANNs, that is, networks that predict the next word or speech frame, transfer well to downstream language tasks like question answering and coreference resolution, and to speech tasks like speaker verification and translation across languages ( Z. Chen et al, 2022 ; A. Wu et al, 2020 ).…”
Section: Experimental Designs In Language Neurosciencementioning
confidence: 99%