2021
DOI: 10.48550/arxiv.2105.05612
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization

Abstract: Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features and can ignore complex, equally-predictive ones. This simplicity bias can explain their lack of robustness out of distribution (OOD). The more complex the task to learn, the more likely it is that statistical artifacts (i.e. selection biases, spurious correlations) are simpler than the mechanisms to learn.We demonstrate that the simplicity bias can be mitigated and OOD generalization improved. We train a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 54 publications
0
4
0
Order By: Relevance
“…Another group of papers proposes regularization techniques to learn diverse solutions on the train data, focusing on different groups of features [95,56,71,76]. Xu et al [101] show how to train orthogonal classifiers, i.e.…”
Section: Related Workmentioning
confidence: 99%
“…Another group of papers proposes regularization techniques to learn diverse solutions on the train data, focusing on different groups of features [95,56,71,76]. Xu et al [101] show how to train orthogonal classifiers, i.e.…”
Section: Related Workmentioning
confidence: 99%
“…In [20,19], a debiased dataset was generated using human labor. Various studies [2,33,47,57,62] have attempted to reduce dataset bias using explicit bias labels. These studies [2,33,47,57,45,44], used bias labels for each sample to reduce the influence of the bias labels when classifying target labels.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, [61] proposed the EnD regularizer, which entangles target correlated features and disentangles biased attributes. Several studies [2,33,62] have designed DNNs as a shared feature extractors and multiple classifiers. In contrast to the shared feature extractor methods, [47,53] fabricated a classifier and conditional generative adversarial networks, yielding test samples to determine whether the classifier was biased.…”
Section: Related Workmentioning
confidence: 99%
“…EnD(Tartaglione, Barbano, and Grangetto 2021) proposes to entangle the target attribute and disengle the biased attributes. Multi-expert approaches(Alvi, Zisserman, and Nellåker 2018;Kim et al 2019;Teney et al 2021) use a shared feature extrator with multiple FC layers to classify multiple attributes independently. (McDuff et al 2019; Ramaswamy, Kim, and Russakovsky 2021) use conditional generator to determine if the trained classifier is biased.…”
mentioning
confidence: 99%