2020
DOI: 10.1162/tacl_a_00298
|View full text |Cite
|
Sign up to set email alerts
|

What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

Abstract: Pre-training by language modeling has become a popular and successful approach to NLP tasks, but we have yet to understand exactly what linguistic capacities these pretraining processes confer upon models. In this paper we introduce a suite of diagnostics drawn from human language experiments, which allow us to ask targeted questions about information used by language models for generating predictions in context. As a case study, we apply these diagnostics to the popular BERT model, finding that it can general… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

13
429
0
4

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 454 publications
(446 citation statements)
references
References 32 publications
13
429
0
4
Order By: Relevance
“…At a high level, mechanisms for interpreting BERT can be categorized into three main categories: interpreting the learned embeddings [1,16,23,49,86], BERT's learned knowledge of syntax [27,30,32,45,47,76], and BERT's learned knowledge of semantics [24,76].…”
Section: Interpreting Models In Nlpmentioning
confidence: 99%
“…At a high level, mechanisms for interpreting BERT can be categorized into three main categories: interpreting the learned embeddings [1,16,23,49,86], BERT's learned knowledge of syntax [27,30,32,45,47,76], and BERT's learned knowledge of semantics [24,76].…”
Section: Interpreting Models In Nlpmentioning
confidence: 99%
“…These high performance levels typically come at the cost of decreased interpretability. Such neural nets are notoriously prone to learning irrelevant correlations (Ettinger, 2020; Futrell et al, 2019; Kuncoro et al, 2018; van Schijndel, Mueller, & Linzen, 2019). To avoid this problem and focus our investigation more squarely on structural constraints like locality in Grodner and Gibson (2005) and non‐structural factors such as animacy in Traxler et al (2002), we instead proceed with an explicit grammar whose generalization ability rests upon well‐chosen syntactic analyses.…”
Section: From Grammar To Processing Difficulty Predictionsmentioning
confidence: 99%
“…However, it is now known that these models lack reasoning capabilities, often simply exploiting statistical artifacts in the data sets, instead of actually understanding language (Niven and Kao, 2019;McCoy et al, 2019). Moreover, Ettinger (2020) found that the popular BERT model (Devlin et al, 2019) completely failed to acquire a general understanding of negation. Related, Bender and Koller (2020) contend that meaning cannot be learned from form alone, and argue for approaches that focus on grounding the language (communication) in the real world.…”
Section: Introductionmentioning
confidence: 99%