Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1286
|View full text |Cite
|
Sign up to set email alerts
|

Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs

Abstract: Though state-of-the-art sentence representation models can perform tasks requiring significant knowledge of grammar, it is an open question how best to evaluate their grammatical knowledge. We explore five experimental methods inspired by prior work evaluating pretrained sentence representation models. We use a single linguistic phenomenon, negative polarity item (NPI) licensing in English, as a case study for our experiments. NPIs like any are grammatical only if they appear in a licensing environment like ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
122
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 95 publications
(123 citation statements)
references
References 33 publications
1
122
0
Order By: Relevance
“…Within the paradigm of training large pretrained Transformer language representations via intermediate-stage training before fine-tuning on a target task, positive transfer has been shown in both sequential task-to-task (Phang et al, 2018) and multi-task-to-task Raffel et al, 2019) formats. Wang et al (2019a) perform an extensive study on transfer with BERT, finding language modeling and NLI tasks to be among the most beneficial tasks for improving target-task performance. Talmor and Berant (2019) perform a similar cross-task transfer study on reading comprehension datasets, finding similar positive transfer in most cases, with the biggest gains stemming from a combination of multiple QA datasets.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Within the paradigm of training large pretrained Transformer language representations via intermediate-stage training before fine-tuning on a target task, positive transfer has been shown in both sequential task-to-task (Phang et al, 2018) and multi-task-to-task Raffel et al, 2019) formats. Wang et al (2019a) perform an extensive study on transfer with BERT, finding language modeling and NLI tasks to be among the most beneficial tasks for improving target-task performance. Talmor and Berant (2019) perform a similar cross-task transfer study on reading comprehension datasets, finding similar positive transfer in most cases, with the biggest gains stemming from a combination of multiple QA datasets.…”
Section: Related Workmentioning
confidence: 99%
“…Unsupervised pretraining-e.g., BERT (Devlin et al, 2019) or RoBERTa (Liu et al, 2019b)-has recently pushed the state of the art on many natural language understanding tasks. One method of further improving pretrained models that has been shown to be broadly helpful is to first finetune a pretrained model on an intermediate task, before fine-tuning again on the target task of interest (Phang et al, 2018;Wang et al, 2019a;Clark et al, 2019a;Sap et al, 2019), also referred to as * Equal contribution. STILTs.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…These large corpora have been used as part of larger benchmark sets, e.g., GLUE (Wang et al, 2018), and have proven useful for problems beyond NLI, such as sentence representation and transfer learning (Conneau et al, 2017;Subramanian et al, 2018;Reimers and Gurevych, 2019), automated question-answering (Khot et al, 2018;Trivedi et al, 2019) and model probing (Warstadt et al, 2019;Geiger et al, 2020;Jeretic et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…To evaluate and promote the robustness of neural models against noise, some studies manually create new datasets with specific linguistic phenomena (Linzen et al, 2016;Marvin and Linzen, 2018;Goldberg, 2019;Warstadt et al, 2019a). Others have introduced various methods to generate synthetic errors on clean downstream datasets, in particular, machine translation corpora.…”
Section: Synthesized Errorsmentioning
confidence: 99%