2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01434
|View full text |Cite
|
Sign up to set email alerts
|

When Does Contrastive Visual Representation Learning Work?

Abstract: Recent self-supervised representation learning techniques have largely closed the gap between supervised and unsupervised learning on ImageNet classification. While the particulars of pretraining on ImageNet are now relatively well understood, the field still lacks widely accepted best practices for replicating this success on other datasets. As a first step in this direction, we study contrastive selfsupervised learning on four diverse large-scale datasets. By looking through the lenses of data quantity, data… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
21
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 66 publications
(22 citation statements)
references
References 38 publications
1
21
0
Order By: Relevance
“…Nevertheless, a recent SSL study also supports our claim and sheds some light on this issue as follows: [41] claims that current self-supervised methods learn representations that can easily disambiguate coarse-grained visual concepts like those in ImageNet. However, as the granularity of the concepts becomes finer, self-supervised performance lags further behind supervised baselines.…”
Section: Discussionsupporting
confidence: 76%
“…Nevertheless, a recent SSL study also supports our claim and sheds some light on this issue as follows: [41] claims that current self-supervised methods learn representations that can easily disambiguate coarse-grained visual concepts like those in ImageNet. However, as the granularity of the concepts becomes finer, self-supervised performance lags further behind supervised baselines.…”
Section: Discussionsupporting
confidence: 76%
“…We attribute this to the smaller amount of domain‐specific unlabeled image data used for SSL pretraining. Evidence suggests that performance of SSL models increases with the availability of larger unlabeled datasets (Cole et al., 2021).…”
Section: Resultsmentioning
confidence: 99%
“…For example, D w was designed to be large enough for self-supervised learning, which has received surprisingly little attention in the WSOL community [9]. We are also interested in using iNatLoc500 to study whether self-supervised learning methods can be improved by using WSOL methods to select crops [40], especially in the context of fine-grained data [15]. For the object detection community, the clean boxes in iNatLoc500 can (i) serve as a test set for object detectors trained on the noisy iNat17 boxes, (ii) be used to study the problem of learning multi-instance detectors from one box per image, and (iii) be used to analyze the role of label granularity in object detection.…”
Section: Discussionmentioning
confidence: 99%