2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021
DOI: 10.1109/cvprw53098.2021.00348
|View full text |Cite
|
Sign up to set email alerts
|

Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 25 publications
(15 citation statements)
references
References 20 publications
0
15
0
Order By: Relevance
“…Recently, benefiting from contrastive training [65] and the scalability of modern backbones, CLIP [43] and ALIGN [30] learn strong visual representations on large-scale image-text datasets, advancing the transfer performance on the downstream vision tasks. Later works improve CLIP by introducing more auxiliary loss functions to assist the image-text contrastive loss, such as image self-supervision loss [40,34], self-distillation loss [12], and token-wise max similarity [62].…”
Section: Related Workmentioning
confidence: 99%
“…Recently, benefiting from contrastive training [65] and the scalability of modern backbones, CLIP [43] and ALIGN [30] learn strong visual representations on large-scale image-text datasets, advancing the transfer performance on the downstream vision tasks. Later works improve CLIP by introducing more auxiliary loss functions to assist the image-text contrastive loss, such as image self-supervision loss [40,34], self-distillation loss [12], and token-wise max similarity [62].…”
Section: Related Workmentioning
confidence: 99%
“…ALIGN [18] further scales-up the pre-traininig dataset to noisier 1B data points. [7] proposes a distillation based loss to better handle the noise in the dataset. Efficient-CLIP [44] is a concurrent work that uses text-only data to perform unimodal MLM task.…”
Section: Related Workmentioning
confidence: 99%
“…Knowledge Distillation: In conventional knowledge distillation [12,17,24,26,30,33,37,44,47,50], a larger model serves as the teacher to a smaller student model. In self-distillation scenarios [11,14,28], the size of the teacher model is typically comparable to student model. The goal is usually to obtain a computationally lighter and more efficient framework, but still maintain similar or even higher accuracy.…”
Section: Related Workmentioning
confidence: 99%