2020
DOI: 10.48550/arxiv.2010.14713
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CompRess: Self-Supervised Learning by Compressing Representations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…Many of them achieve the SOTA performance on the downstream linear classification task with the backbone network fixed (Zhang, Isola, and Efros 2016;Oord, Li, and Vinyals 2018;Bachman, Hjelm, and Buchwalter 2019). However, little attention has been paid to training small models (Howard et al 2017;Tan and Le 2019) solely under the contrastive learning framework, for its failure has been widely observed (Koohpayegani, Tejankar, and Pirsiavash 2020;Fang et al 2021;Xu et al 2021;Gu, Liu, and Tian 2021). In this paper, we want to fill in the void of training small models with and only with contrastive learning signals.…”
Section: Self-supervised Contrastive Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…Many of them achieve the SOTA performance on the downstream linear classification task with the backbone network fixed (Zhang, Isola, and Efros 2016;Oord, Li, and Vinyals 2018;Bachman, Hjelm, and Buchwalter 2019). However, little attention has been paid to training small models (Howard et al 2017;Tan and Le 2019) solely under the contrastive learning framework, for its failure has been widely observed (Koohpayegani, Tejankar, and Pirsiavash 2020;Fang et al 2021;Xu et al 2021;Gu, Liu, and Tian 2021). In this paper, we want to fill in the void of training small models with and only with contrastive learning signals.…”
Section: Self-supervised Contrastive Learningmentioning
confidence: 99%
“…Currently, knowledge distillation (Hinton, Vinyals, and Dean 2015) becomes a widely acknowledged paradigm to solve the slow convergence and difficulty of optimization in self-supervised pretext task for small models (Koohpayegani, Tejankar, and Pirsiavash 2020; Fang et al 2021;Xu et al 2021;Gu, Liu, and Tian 2021). ComPress (Koohpayegani, Tejankar, and Pirsiavash 2020) and SEED (Fang et al 2021) distill the small models based on the similarity distributions among different instances randomly sampled from a dynamically maintained queue. DisCo removes the negative sample queue and straightforwardly distills the final embedding to transmit the teacher's knowledge to a lightweight model.…”
Section: Self-supervised Small Modelsmentioning
confidence: 99%
“…Currently, knowledge distillation (Hinton, Vinyals, and Dean 2015) becomes a widely acknowledged paradigm to solve the slow convergence and difficulty of optimization in self-supervised pretext task for small models (Koohpayegani, Tejankar, and Pirsiavash 2020; Fang et al 2021;Xu et al 2021;Gu, Liu, and Tian 2021). Compress (Koohpayegani, Tejankar, and Pirsiavash 2020) and SEED (Fang et al 2021) distill the small models based on the similarity distributions among different instances randomly sampled from a dynamically maintained queue. DisCo ) removes the samples queue and straightforwardly distills the final embedding to transmit the teacher's knowledge to a lightweight model.…”
Section: Self-supervised Learning For Small Modelsmentioning
confidence: 99%
“…In this setting, the problem of training self-supervised small models boils down to two phases. It first trains a large learner in a self-supervised fashion and then trains the small learner to mimic the representation distribution of the large learner (Koohpayegani, Tejankar, and Pirsiavash 2020;Fang et al 2021;Xu et al 2021;Gu, Liu, and Tian 2021;.…”
Section: Introductionmentioning
confidence: 99%