2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01404
|View full text |Cite
|
Sign up to set email alerts
|

Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(13 citation statements)
references
References 17 publications
0
13
0
Order By: Relevance
“…Prior works tend to believe that asymmetric designs are necessary for avoiding complete feature collapse (Zhang et al, 2022a), while we show that a fully symmetric architecture, dubbed SymSimSiam (Symmetric Simple Siamese network), can also avoid complete collapse. Specifically, we simply align the positive pair (x, x + ) with a symmetric alignment loss,…”
Section: Asymmetry Is the Key To Alleviate Dimensional Collapsementioning
confidence: 68%
See 1 more Smart Citation
“…Prior works tend to believe that asymmetric designs are necessary for avoiding complete feature collapse (Zhang et al, 2022a), while we show that a fully symmetric architecture, dubbed SymSimSiam (Symmetric Simple Siamese network), can also avoid complete collapse. Specifically, we simply align the positive pair (x, x + ) with a symmetric alignment loss,…”
Section: Asymmetry Is the Key To Alleviate Dimensional Collapsementioning
confidence: 68%
“…Some existing works are proposed to understand some specific non-contrastive techniques, mostly focusing on the predictor head proposed by BYOL (Grill et al, 2020). From an empirical side, Chen & He (2021) think that the predictor helps approximate the expectation over augmentations, and Zhang et al (2022a) take a center-residual decomposition of representations for analyzing the collapse. From a theoretical perspective, Tian et al (2021) analyze the dynamics of predictor weights under simple linear networks, and Wen & Li (2022) obtain optimization guarantees for two-layer nonlinear networks.…”
Section: Introductionmentioning
confidence: 99%
“…This effect can be present a) when learning focuses only on few features and/or b) when the covariance structure in the data is insufficiently extracted. Explaining away can be caused by saturation of the InfoNCE objective [2, 11, 12]. To ameliorate these drawbacks, CLOOB [2] has introduced the InfoLOOB objective together with Hopfield networks as a promising method for contrastive learning.…”
Section: Methodsmentioning
confidence: 99%
“…They consider temperature as a measure of embedding confidence and propose temperature as uncertainty. Zhang et al [54] adopt dual temperature in a contrastive InfoNCE for realizing independent control of two hardness-aware sensitiveness. Previous temperature analysis works mainly focus on the penalty's unevenness of negative samples within an anchor or the sum of penalties of different anchors within a training batch.…”
Section: Related Workmentioning
confidence: 99%
“…When the temperature is fixed, the gradient's magnitude with respect to a positive sample is equal to the sum of gradients with respect to all negative samples. Prior works of temperature analysis mainly focus on the penalty's unevenness of negative samples within an anchor [48], or the sum of penalties of different anchors within a training batch [54]. Differently, we pay attention to the proportion of penalties between the positive sample and negative samples.…”
Section: Adaptive Contrastivementioning
confidence: 99%