2020
DOI: 10.48550/arxiv.2001.01385
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning

Abstract: We investigate learning a ConvNet classifier with classimbalanced data. We found that a ConvNet over-fits significantly to the minor classes that do not have sufficient training instances, even if it is trained using vanilla empirical risk minimization (ERM). We conduct a series of analysis and argue that feature deviation between the training and test instances serves as the main cause. We propose to incorporate class-dependent temperatures (CDT) in learning a ConvNet: CDT forces the minor-class instances to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
40
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(40 citation statements)
references
References 52 publications
0
40
0
Order By: Relevance
“…To place our method's empirical performance in the context of prior works, we extensively compare our method to state-of-the-art distribution shift methods. On binary CIFAR10, we compare to labeldistribution-aware margin (LDAM) loss [Cao+19], class-dependent temperatures (CDT) loss [Ye+20], logit-adjusted (LA) loss [Men+20], and vector-scaling (VS) loss [Kin+21]. On CelebA, we compare to VS loss only since it encapsulates CDT loss, LA loss, and LDAM loss.…”
Section: Comparing Against Prior Distribution Shift Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…To place our method's empirical performance in the context of prior works, we extensively compare our method to state-of-the-art distribution shift methods. On binary CIFAR10, we compare to labeldistribution-aware margin (LDAM) loss [Cao+19], class-dependent temperatures (CDT) loss [Ye+20], logit-adjusted (LA) loss [Men+20], and vector-scaling (VS) loss [Kin+21]. On CelebA, we compare to VS loss only since it encapsulates CDT loss, LA loss, and LDAM loss.…”
Section: Comparing Against Prior Distribution Shift Methodsmentioning
confidence: 99%
“…However, without regularization or early-stopping these corrections are ineffective since the additive corrections to the logits is analogous to importance weighting exponential-tailed losses. Multiplicative logit corrections [Ye+20], possibly combined with additive corrections [Kin+21], have also been proposed which do affect the implicit bias of the learnt classifier. However, these do not correspond to importance weighting algorithms, and further these works do not provide guidance regarding how one should select these multiplicative corrections in practice.…”
Section: Related Workmentioning
confidence: 99%
“…Cost-sensitive learning seeks to re-balance classes by adjusting loss values for different classes during training [119], [120], [121], [122], [123], [124], [125]. Recent studies have developed various costsensitive long-tailed learning methods to handle class imbalance, including class-level re-weighting and class-level re-margining.…”
Section: Cost-sensitive Learningmentioning
confidence: 99%
“…To address this, several studies [80], [95] proposed to use the scaleinvariant cosine classifier p = φ(( w f w f )/τ + b), where both the classifier weights and sample features are normalized. Here, the temperature τ should be chosen reasonably [125], or the classifier performance would be negatively influenced.…”
Section: Classifier Designmentioning
confidence: 99%
“…Learning with such labels, model predictions become biased towards the head classes [18] mainly due to the skewed decision boundary of linear classifier [7,28,59]. Accordingly, the generated pseudo-labels from U in this scheme also easily tend to be biased even compared to the underlying distribution of each class M k [29], leading to the performance degradation on a balanced test data.…”
Section: Introductionmentioning
confidence: 99%