Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning

Ye, Han-Jia; Chen, Hongyou; Zhan, De-Chuan; Chao, Weilun

doi:10.48550/arxiv.2001.01385

Cited by 22 publications

(40 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To place our method's empirical performance in the context of prior works, we extensively compare our method to state-of-the-art distribution shift methods. On binary CIFAR10, we compare to labeldistribution-aware margin (LDAM) loss [Cao+19], class-dependent temperatures (CDT) loss [Ye+20], logit-adjusted (LA) loss [Men+20], and vector-scaling (VS) loss [Kin+21]. On CelebA, we compare to VS loss only since it encapsulates CDT loss, LA loss, and LDAM loss.…”

Section: Comparing Against Prior Distribution Shift Methodsmentioning

confidence: 99%

“…However, without regularization or early-stopping these corrections are ineffective since the additive corrections to the logits is analogous to importance weighting exponential-tailed losses. Multiplicative logit corrections [Ye+20], possibly combined with additive corrections [Kin+21], have also been proposed which do affect the implicit bias of the learnt classifier. However, these do not correspond to importance weighting algorithms, and further these works do not provide guidance regarding how one should select these multiplicative corrections in practice.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Is Importance Weighting Incompatible with Interpolating Classifiers?

Wang¹,

Chatterji²,

Saminul³

et al. 2021

Preprint

View full text Add to dashboard Cite

Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We show that importance weighting fails not because of the overparameterization, but instead, as a result of using exponentially-tailed losses like the logistic or cross-entropy loss. As a remedy, we show that polynomially-tailed losses restore the effects of importance reweighting in correcting distribution shift in overparameterized models. We characterize the behavior of gradient descent on importance weighted polynomially-tailed losses with overparameterized linear models, and theoretically demonstrate the advantage of using polynomially-tailed losses in a label shift setting. Surprisingly, our theory shows that using weights that are obtained by exponentiating the classical unbiased importance weights can improve performance. Finally, we demonstrate the practical value of our analysis with neural network experiments on a subpopulation shift and a label shift dataset. When reweighted, our loss function can outperform reweighted cross-entropy by as much as 9% in test accuracy. Our loss function also gives test accuracies comparable to, or even exceeding, well-tuned state-of-the-art methods for correcting distribution shifts.

show abstract

Section: Comparing Against Prior Distribution Shift Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Is Importance Weighting Incompatible with Interpolating Classifiers?

Wang¹,

Chatterji²,

Saminul³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Cost-sensitive learning seeks to re-balance classes by adjusting loss values for different classes during training [119], [120], [121], [122], [123], [124], [125]. Recent studies have developed various costsensitive long-tailed learning methods to handle class imbalance, including class-level re-weighting and class-level re-margining.…”

Section: Cost-sensitive Learningmentioning

confidence: 99%

“…To address this, several studies [80], [95] proposed to use the scaleinvariant cosine classifier p = φ(( w f w f )/τ + b), where both the classifier weights and sample features are normalized. Here, the temperature τ should be chosen reasonably [125], or the classifier performance would be negatively influenced.…”

Section: Classifier Designmentioning

confidence: 99%

Deep Long-Tailed Learning: A Survey

Zhang¹,

Kang²,

Hooi³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep long-tailed learning, one of the most challenging problems in visual recognition, aims to train well-performing deep models from a large number of images that follow a long-tailed class distribution. In the last decade, deep learning has emerged as a powerful recognition model for learning high-quality image representations and has led to remarkable breakthroughs in generic visual recognition. However, long-tailed class imbalance, a common problem in practical visual recognition tasks, often limits the practicality of deep network based recognition models in real-world applications, since they can be easily biased towards dominant classes and perform poorly on tail classes. To address this problem, a large number of studies have been conducted in recent years, making promising progress in the field of deep long-tailed learning. Considering the rapid evolution of this field, this paper aims to provide a comprehensive survey on recent advances in deep long-tailed learning. To be specific, we group existing deep long-tailed learning studies into three main categories (i.e., class re-balancing, information augmentation and module improvement), and review these methods following this taxonomy in detail. Afterward, we empirically analyze several state-of-the-art methods by evaluating to what extent they address the issue of class imbalance via a newly proposed evaluation metric, i.e., relative accuracy. We conclude the survey by highlighting important applications of deep long-tailed learning and identifying several promising directions for future research.

show abstract

“…Learning with such labels, model predictions become biased towards the head classes [18] mainly due to the skewed decision boundary of linear classifier [7,28,59]. Accordingly, the generated pseudo-labels from U in this scheme also easily tend to be biased even compared to the underlying distribution of each class M k [29], leading to the performance degradation on a balanced test data.…”

Section: Introductionmentioning

confidence: 99%

DASO: Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised Learning

Oh¹,

Kim²,

Kweon³

2021

Preprint

View full text Add to dashboard Cite

The capability of the traditional semi-supervised learning (SSL) methods is far from real-world application since they do not consider (1) class imbalance and (2) class distribution mismatch between labeled and unlabeled data. This paper addresses such a relatively under-explored problem, imbalanced semi-supervised learning, where heavily biased pseudo-labels can harm the model performance. Interestingly, we find that the semantic pseudo-labels from a similarity-based classifier in feature space and the traditional pseudo-labels from the linear classifier show the complementary property. To this end, we propose a general pseudo-labeling framework to address the bias motivated by this observation. The key idea is to class-adaptively blend the semantic pseudo-label to the linear one, depending on the current pseudo-label distribution. Thereby, the increased semantic pseudo-label component suppresses the false positives in the majority classes and vice versa. We term the novel pseudo-labeling framework for imbalanced SSL as Distribution-Aware Semantics-Oriented (DASO) Pseudo-label. Extensive evaluation on CIFAR10/100-LT and STL10-LT shows that DASO consistently outperforms both recently proposed re-balancing methods for label and pseudo-label. Moreover, we demonstrate that typical SSL algorithms can effectively benefit from unlabeled data with DASO, especially when (1) class imbalance and (2) class distribution mismatch exist and even on recent real-world Semi-Aves benchmark.

show abstract

Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning

Cited by 22 publications

References 52 publications

Is Importance Weighting Incompatible with Interpolating Classifiers?

Is Importance Weighting Incompatible with Interpolating Classifiers?

Deep Long-Tailed Learning: A Survey

DASO: Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised Learning

Contact Info

Product

Resources

About