Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.227
|View full text |Cite
|
Sign up to set email alerts
|

Life after BERT: What do Other Muppets Understand about Language?

Abstract: Existing pre-trained transformer analysis works usually focus only on one or two model families at a time, overlooking the variability of the architecture and pre-training objectives. In our work, we utilize the oLMpics benchmark and psycholinguistic probing datasets for a diverse set of 29 models including T5, BART, and ALBERT. Additionally, we adapt the oLMpics zero-shot setup for autoregressive models and evaluate GPT networks of different sizes. Our findings show that none of these models can resolve compo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(7 citation statements)
references
References 36 publications
0
7
0
Order By: Relevance
“…Han et al (2021) and Liu et al (2022) prompted GPT3 to generate synthetic translation and NLI datasets, respectively. Lialin et al (2022) and Ettinger (2019) evaluated language models on smaller datasets for negation and role reversal. We extend these datasets to around 1500 data points and evaluate 22 models, including GPT3.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Han et al (2021) and Liu et al (2022) prompted GPT3 to generate synthetic translation and NLI datasets, respectively. Lialin et al (2022) and Ettinger (2019) evaluated language models on smaller datasets for negation and role reversal. We extend these datasets to around 1500 data points and evaluate 22 models, including GPT3.…”
Section: Related Workmentioning
confidence: 99%
“…The field of analysis of pre-trained models has grown rapidly in recent years (Zagoury et al, 2021;Liu et al, 2021;Lialin et al, 2022;bench authors, 2023;Rogers et al, 2020). Methods such as attention pattern analysis (Kovaleva et al, 2019;Kobayashi et al, 2020), linear probing (Tenney et al, 2019), and zero-shot probing (Belinkov et al, 2020;Talmor et al, 2019;Ettinger, 2019;Lialin et al, 2022) allow us to evaluate specific capabilities of pre-trained models.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations