Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2022
DOI: 10.18653/v1/2022.acl-short.71
|View full text |Cite
|
Sign up to set email alerts
|

When classifying grammatical role, BERT doesn’t care about word order... except when it matters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 8 publications
3
8
0
Order By: Relevance
“…The fact that sentence representations of later model layers are more suitable for decoding plausibility than those of earlier layers is consistent with previous results showing that semantic information tends to be encoded more strongly in later layers (Belinkov et al, 2017;Papadimitriou, Futrell, & Mahowald, 2022;Tenney, Das, & Pavlick, 2019). The trend we observed in one of the models, MPT, where mid-layer performance exceeded late-layer performance, should be examined further in other large (multi-billion parameter) models.…”
Section: Internal Representations Of Event Plausibility Generalize Ac...supporting
confidence: 91%
See 1 more Smart Citation
“…The fact that sentence representations of later model layers are more suitable for decoding plausibility than those of earlier layers is consistent with previous results showing that semantic information tends to be encoded more strongly in later layers (Belinkov et al, 2017;Papadimitriou, Futrell, & Mahowald, 2022;Tenney, Das, & Pavlick, 2019). The trend we observed in one of the models, MPT, where mid-layer performance exceeded late-layer performance, should be examined further in other large (multi-billion parameter) models.…”
Section: Internal Representations Of Event Plausibility Generalize Ac...supporting
confidence: 91%
“…A probe that is trained on a mix of active and passive sentences performs as successfully as the probe trained and tested on only one voice type, suggesting that plausible and implausible sentence embeddings in late LLM layers are linearly separable by the same hyperplane across syntactic structures. This finding aligns with recent computational work showing that even though most sentences in the language input describe prototypical events , LLMs are able to correctly represent the argument structure of nonprototypical event descriptions in late layers (Papadimitriou et al, 2022).…”
Section: Llms Can Infer Thematic Rolessupporting
confidence: 90%
“…Word order is a crucial aspect of natural language, and studies have investigated its impact on language models by perturbing word order (Sinha et al, 2021b;Pham et al, 2021;Gupta et al, 2021;Hessel and Schofield, 2021;Clouatre et al, 2022;Yanaka and Mineshima, 2022;Papadimitriou et al, 2022) Negheimish et al (2023) tries to preserve the importance of word order by forcing the model to identify permuted sequences as invalid samples. In summary, existing works have coherently found that breaking word order do not result in a significant decrease in task performance.…”
Section: Related Workmentioning
confidence: 99%
“…Word order, referring to the sequential order of individual words within a text, is a fundamental concept in natural language. Previous works have investigated word order's impacts by altering order, and they find that no matter altering order in the pre-training or the training/inference data, the performance of downstream tasks drops marginally (Sinha et al, 2021a,b;Pham et al, 2021;Gupta et al, 2021;Hessel and Schofield, 2021;Clouatre et al, 2022;Yanaka and Mineshima, 2022;Papadimitriou et al, 2022), which demonstrates a counter-intuitive and unnatural phenomenon.…”
Section: Introductionmentioning
confidence: 98%
See 1 more Smart Citation