2022
DOI: 10.48550/arxiv.2202.04414
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Agree to Disagree: Diversity through Disagreement for Better Transferability

Abstract: Gradient-based learning algorithms have an implicit simplicity bias which in effect can limit the diversity of predictors being sampled by the learning procedure. This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious featurespresent in the training data but absent from the test data -and (ii) by only leveraging a small subset of predictive features. Such an effect is especially magnified when the test distribution does not exactly match the train di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
7
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 18 publications
0
7
0
Order By: Relevance
“…Another group of papers proposes regularization techniques to learn diverse solutions on the train data, focusing on different groups of features [95,56,71,76]. Xu et al [101] show how to train orthogonal classifiers, i.e.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Another group of papers proposes regularization techniques to learn diverse solutions on the train data, focusing on different groups of features [95,56,71,76]. Xu et al [101] show how to train orthogonal classifiers, i.e.…”
Section: Related Workmentioning
confidence: 99%
“…and Pagliardini et al[71], we consider Dominoes binary classification datasets, where the top half of the image shows MNIST digits[55] from classes {0, 1}, and the bottom half shows MNIST images from classes {7, 9} (MNIST-MNIST ), Fashion-MNIST[99] images from classes {coat, dress} (MNIST-Fashion) or CIFAR-10 [52] images from classes {car, truck} (MNIST-CIFAR). In all Dominoes datasets, the top half of the image (MNIST 0 − 1 images) presents a linearly separable feature; the bottom half of the image presents a harder to learn feature.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…While it is often assumed that the intuitions from tree-based ensembles carry over to this setting (e.g. Mishtal and Arel, 2012;Fort et al, 2019;Ross et al, 2020;Pagliardini et al, 2022), we find thatsurprisingly-deep ensemble performance is anticorrelated with predictive diversity. We demonstrate this phenomenon in Fig.…”
Section: Introductionmentioning
confidence: 51%
“…Training an additional network that complements a biased model 19 or ensembles that learn diverse feature sets alleviate the problem that the model only learns a few potentially irrelevant features. 20 To prevent models from using spurious correlations between the image and the class label, 21-23 the network's last layer can be fine-tuned on data that does not show such correlations. 24 Kobs et al 17 investigate the influence of different image factors such as item or background color on different DML models.…”
Section: Related Workmentioning
confidence: 99%