2020
DOI: 10.48550/arxiv.2010.01809
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Long-tailed Recognition by Routing Diverse Distribution-Aware Experts

Abstract: Natural data are often long-tail distributed over semantic classes. Existing recognition methods tend to focus on tail performance gain, often at the expense of head performance loss from increased classifier variance. The low tail performance manifests itself in large inter-class confusion and high classifier variance. We aim to reduce both the bias and the variance of a long-tailed classifier by RoutIng Diverse Experts (RIDE). It has three components: 1) a shared architecture for multiple classifiers (expert… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
89
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 51 publications
(89 citation statements)
references
References 28 publications
0
89
0
Order By: Relevance
“…Recent works proposed decoupling training [18,48] which first obtains a good representation through conventional training and then calibrates the classifier. Other techniques like ensemble [40], self-supervised learning [20,44] and knowledge distillation [15,20] are verified to be useful in long-tailed learning.…”
Section: Related Workmentioning
confidence: 99%
“…Recent works proposed decoupling training [18,48] which first obtains a good representation through conventional training and then calibrates the classifier. Other techniques like ensemble [40], self-supervised learning [20,44] and knowledge distillation [15,20] are verified to be useful in long-tailed learning.…”
Section: Related Workmentioning
confidence: 99%
“…Each classifier focuses on the classification of a small and relatively balanced group of classes from the data. [40] described a shared architecture for multiple classifiers, a distribution-aware loss, and an expert routing module. The current paper proposes a two-stage approach for training one model with a single classifier.…”
Section: Long-tail Recognitionmentioning
confidence: 99%
“…𝜏-norm [10] OLTR [15] LDAM+DRW [1] cRT [10] Progressive resampling + IM CB Resampling +IM CB Focal Loss + IM Input Mix-up [30] Class-balanced Cross-entropy Loss Focal Loss [14] Progressive Resampling [10] Class-balanced Resampling [9] ReMix [2] Manifold Mix-up [30] BBN [34] One-stage: re-balancing and augmentations Multi-stage: multi-stage training and transfer learning RIDE [23] LFME [26] Figure 1. Performance of representative long-tailed recognition methods in terms of majority and minority classes compared to the baseline model (a ResNet).…”
Section: Baseline Ace (Ours)mentioning
confidence: 99%