2021
DOI: 10.48550/arxiv.2103.06922
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU Models

Abstract: Recent studies indicate that NLU models are prone to rely on shortcut features for prediction, without achieving true language understanding. As a result, these models fail to generalize to real-world out-of-distribution data. In this work, we show that the words in the NLU training set can be modeled as a longtailed distribution. There are two findings: 1) NLU models have strong preference for features located at the head of the long-tailed distribution, and 2) Shortcut features are picked up during very earl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 41 publications
0
6
0
Order By: Relevance
“…For example, in [27], a framework named COSOC was proposed to tackle this shortcut problem by extracting the foreground objects in images to get rid of background-related shortcuts based on a contrastive learning approach. [7] proposed a measurement for quantifying the shortcut degree, with which a shortcut mitigation framework was introduced for natural language understanding (NLU). [47] forces the network to learn the necessary features for all the words in the input to alleviate the shortcut learning problem in supervised Paraphrase Identification (PI).…”
Section: Shortcut Learningmentioning
confidence: 99%
“…For example, in [27], a framework named COSOC was proposed to tackle this shortcut problem by extracting the foreground objects in images to get rid of background-related shortcuts based on a contrastive learning approach. [7] proposed a measurement for quantifying the shortcut degree, with which a shortcut mitigation framework was introduced for natural language understanding (NLU). [47] forces the network to learn the necessary features for all the words in the input to alleviate the shortcut learning problem in supervised Paraphrase Identification (PI).…”
Section: Shortcut Learningmentioning
confidence: 99%
“…Besides, NLI systems relying on superficial syntactic properties (e.g., the lexical overlap heuristic, the subsequence heuristic, the constituent heuristic) may succeed on the majority of examples McCoy et al, 2019;Clark et al, 2019;Utama et al, 2020;Pezeshkpour et al, 2021). On quite a few NLP tasks composed of several components, it has been observed that models fed with partial input can achieve competitive performance compared with those feeding with full input, e.g., leveraging claims without evidence for fact verification (Schuster et al, 2019;Utama et al, 2020;Du et al, 2021b), choosing a plausible story ending for in the narrative cloze test without looking at the story (Cai et al, 2017), question answering based biased positional predictions on the reference document (Jia and Liang, 2017;Kaushik and Lipton, 2018), selecting the appropriate warrant with claims only (without reason) in argument reasoning comprehension (Niven and Kao, 2019;Branco et al, 2021), etc. This paper is the first work to diagnose existing entity typing models whether they have exploited spurious correlations, including the common lexical and the partial input (mention-context) bias, and the task-specific dependency bias.…”
Section: Related Workmentioning
confidence: 99%
“…When biases exist in the dataset, neural networks tend to capture spurious associations and exploit dataset biases as shortcuts to obtain higher evaluation performance instead of truly understanding the language and images, i.e., the models may not learn the intrinsic attributes of better generalization and the ability of reasoning. Recently, several research works 18,19 have shown that intrinsic attributes are usually more difficult to learn than bias attributes, and that in the early training phase of the model, the model first learns shortcut features for fast loss reduction, and only after that the model gradually learns intrinsic attributes for further loss reduction. In short, in the early training stage, the model will first learn to fit the bias-aligned samples, and then gradually fit the bias-conflicting samples.…”
Section: Alleviating Shortcut Learning Behaviormentioning
confidence: 99%