2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00015
|View full text |Cite
|
Sign up to set email alerts
|

Asymmetric Loss For Multi-Label Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
171
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 319 publications
(171 citation statements)
references
References 20 publications
0
171
0
Order By: Relevance
“…We now evaluate and compare such representations against state-of-the-art DML models to understand if generic representations that are readily available nowadays actually pose an alternative to explicit application of DML. We select state-of-the-art self-supervision models SwAV [3] (ResNet50 backbone), CLIP [51] trained via natural language supervision on a large dataset of 400 million image and sentence pairs (VisionTransformer [10] backbone), BiT(-M) [30], which trains a ResNet50-V2 [30] on both the standard ImageNet [7] (1 million training samples) and the ImageNet-21k dataset [7,54] with 14 million training samples and over 21 thousand classes, an EfficientNet-B0 [66] trained on ImageNet, and a standard baseline ResNet50 network trained on ImageNet. We note that none of these representations have been additionally adapted to the benchmark sets and only the pretrained representations are being evaluated, in contrast to the DML approaches which have been trained on the respective train splits.…”
Section: Generic Representations Versus Deep Metric Learningmentioning
confidence: 99%
“…We now evaluate and compare such representations against state-of-the-art DML models to understand if generic representations that are readily available nowadays actually pose an alternative to explicit application of DML. We select state-of-the-art self-supervision models SwAV [3] (ResNet50 backbone), CLIP [51] trained via natural language supervision on a large dataset of 400 million image and sentence pairs (VisionTransformer [10] backbone), BiT(-M) [30], which trains a ResNet50-V2 [30] on both the standard ImageNet [7] (1 million training samples) and the ImageNet-21k dataset [7,54] with 14 million training samples and over 21 thousand classes, an EfficientNet-B0 [66] trained on ImageNet, and a standard baseline ResNet50 network trained on ImageNet. We note that none of these representations have been additionally adapted to the benchmark sets and only the pretrained representations are being evaluated, in contrast to the DML approaches which have been trained on the respective train splits.…”
Section: Generic Representations Versus Deep Metric Learningmentioning
confidence: 99%
“…Its ability to quickly fine-tune to a smaller, downstream dataset, generalizing even in a few-shot regime, makes it an attractive backbone for tasks such as out-of-distribution (OOD) detection. In this paper, we use ViT pre-trained on ImageNet-21k [Ridnik et al, 2021].…”
Section: Pre-training Neural Networkmentioning
confidence: 99%
“…To demonstrate the general applicability of our TD module across different tasks and assess its capability as a robust feature extractor, we evaluate its performance as a backbone for existing state-of-the-art methods for fine-grained classification and multi-label object classification. We use the "Weakly Supervised Data Augmentation" method [18] for fine-grained classification and "Asymmetric loss" method [29] for multi-label classification, and simply replace the model backbone in both methods with our pretrained ImageNet-1k models. For fine-grained classification, we consider the Caltech-birds (CUB) and Stanford Dogs (Dogs) datasets and assess models on top-1 validation accuracy.…”
Section: Fine-grained and Multi-label Classificationmentioning
confidence: 99%