Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.98
|View full text |Cite
|
Sign up to set email alerts
|

Out of Order: How important is the sequential order of words in a sentence in Natural Language Understanding tasks?

Abstract: Do state-of-the-art natural language understanding models care about word order? Not always! We found 75% to 90% of the correct predictions of BERT-based classifiers, trained on many GLUE tasks, remain constant after input words are randomly shuffled. Although BERT embeddings are famously contextual, the contribution of each individual word to classification is almost unchanged even after its surrounding words are shuffled. BERTbased models exploit superficial cues (e.g. the sentiment of keywords in sentiment … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
40
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 43 publications
(44 citation statements)
references
References 35 publications
3
40
1
Order By: Relevance
“…This performance gap closes appreciably as we perform more structured syntactic shifts such as reversing the sentence (a drop of 10%), or systematically permuting word orders using the dependency tree (a drop of between 7% and 9%). Rather than being invariant to word orders across natural language understanding tasks (Sinha et al, 2021;Pham et al, 2021), we instead find that BERTbased models are in fact sensitive to word order, at least for the tasks in the GLUE benchmark. In addition, we find that continued pretraining can close the performance gap to all but a few percent-age points for tree-based structural shifts.…”
Section: Syntax Matters But Not Too Muchcontrasting
confidence: 60%
See 1 more Smart Citation
“…This performance gap closes appreciably as we perform more structured syntactic shifts such as reversing the sentence (a drop of 10%), or systematically permuting word orders using the dependency tree (a drop of between 7% and 9%). Rather than being invariant to word orders across natural language understanding tasks (Sinha et al, 2021;Pham et al, 2021), we instead find that BERTbased models are in fact sensitive to word order, at least for the tasks in the GLUE benchmark. In addition, we find that continued pretraining can close the performance gap to all but a few percent-age points for tree-based structural shifts.…”
Section: Syntax Matters But Not Too Muchcontrasting
confidence: 60%
“…While syntax is an crucial aspect of language, studies have also shown syntactic typology to be surprisingly non-predictive of transfer quality (Pham et al, 2021), and other studies have shown LLMs to be largely word-order invariant (Sinha et al, 2021). We investigate a set of syntactic transformations that isolate syntactic word-order shifts from the other factors that can vary between languages such as tokenization, static embeddings, and morphological representation.…”
Section: Syntactic Shiftsmentioning
confidence: 99%
“…Circumstantial evidence for the redundancy of word order comes from work such as that of Niven and Kao (2019), which showed that language models' predictions in certain tasks are largely explained by word-level triggers. Concurrently with this work, Sinha et al (2021a,b); Pham et al (2021); Gupta et al (2021) probed and demonstrated, in various ways, the surprising insensitivity of infilling LMs' performance on GLUE tasks to word order in training and evaluation data. These studies complement our discovery that nearly all of models' accuracy on GLUE tasks can be explained by bags of words only ( §5.2) to show that word order rarely carries information useful for classifying textual similarity, entailment, or sentiment.…”
Section: Related Workmentioning
confidence: 59%
“…There is substantial evidence that RoBERTa is able to associate abstract constructional templates with their meaning without lexical cues. This result is perhaps surprising, given that previous work found that LMs are relatively insensitive to word order in compositional phrases (Yu and Ettinger, 2020) and downstream inference tasks (Sinha et al, 2021;Pham et al, 2021), where their performance can be largely attributed to lexical overlap.…”
Section: Potential Confoundsmentioning
confidence: 77%