CompRess: Self-Supervised Learning by Compressing Representations

Koohpayegani, Soroush Abbasi; Tejankar, Ajinkya; Pirsiavash, Hamed

doi:10.48550/arxiv.2010.14713

Cited by 2 publications

(4 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many of them achieve the SOTA performance on the downstream linear classification task with the backbone network fixed (Zhang, Isola, and Efros 2016;Oord, Li, and Vinyals 2018;Bachman, Hjelm, and Buchwalter 2019). However, little attention has been paid to training small models (Howard et al 2017;Tan and Le 2019) solely under the contrastive learning framework, for its failure has been widely observed (Koohpayegani, Tejankar, and Pirsiavash 2020;Fang et al 2021;Xu et al 2021;Gu, Liu, and Tian 2021). In this paper, we want to fill in the void of training small models with and only with contrastive learning signals.…”

Section: Self-supervised Contrastive Learningmentioning

confidence: 99%

“…Currently, knowledge distillation (Hinton, Vinyals, and Dean 2015) becomes a widely acknowledged paradigm to solve the slow convergence and difficulty of optimization in self-supervised pretext task for small models (Koohpayegani, Tejankar, and Pirsiavash 2020; Fang et al 2021;Xu et al 2021;Gu, Liu, and Tian 2021). ComPress (Koohpayegani, Tejankar, and Pirsiavash 2020) and SEED (Fang et al 2021) distill the small models based on the similarity distributions among different instances randomly sampled from a dynamically maintained queue. DisCo removes the negative sample queue and straightforwardly distills the final embedding to transmit the teacher's knowledge to a lightweight model.…”

Section: Self-supervised Small Modelsmentioning

confidence: 99%

See 1 more Smart Citation

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

Shi

Zhang²,

Tang

et al. 2022

AAAI

View full text Add to dashboard Cite

It is a consensus that small models perform quite poorly under the paradigm of self-supervised contrastive learning. Existing methods usually adopt a large off-the-shelf model to transfer knowledge to the small one via distillation. Despite their effectiveness, distillation-based methods may not be suitable for some resource-restricted scenarios due to the huge computational expenses of deploying a large model. In this paper, we study the issue of training self-supervised small models without distillation signals. We first evaluate the representation spaces of the small models and make two non-negligible observations: (i) the small models can complete the pretext task without overfitting despite their limited capacity and (ii) they universally suffer the problem of over clustering. Then we verify multiple assumptions that are considered to alleviate the over-clustering phenomenon. Finally, we combine the validated techniques and improve the baseline performances of five small architectures with considerable margins, which indicates that training small self-supervised contrastive models is feasible even without distillation signals. The code is available at https://github.com/WOWNICE/ssl-small.

show abstract

Section: Self-supervised Contrastive Learningmentioning

confidence: 99%

Section: Self-supervised Small Modelsmentioning

confidence: 99%

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

Shi

Zhang²,

Tang

et al. 2022

AAAI

View full text Add to dashboard Cite

show abstract

“…Currently, knowledge distillation (Hinton, Vinyals, and Dean 2015) becomes a widely acknowledged paradigm to solve the slow convergence and difficulty of optimization in self-supervised pretext task for small models (Koohpayegani, Tejankar, and Pirsiavash 2020; Fang et al 2021;Xu et al 2021;Gu, Liu, and Tian 2021). Compress (Koohpayegani, Tejankar, and Pirsiavash 2020) and SEED (Fang et al 2021) distill the small models based on the similarity distributions among different instances randomly sampled from a dynamically maintained queue. DisCo ) removes the samples queue and straightforwardly distills the final embedding to transmit the teacher's knowledge to a lightweight model.…”

Section: Self-supervised Learning For Small Modelsmentioning

confidence: 99%

“…In this setting, the problem of training self-supervised small models boils down to two phases. It first trains a large learner in a self-supervised fashion and then trains the small learner to mimic the representation distribution of the large learner (Koohpayegani, Tejankar, and Pirsiavash 2020;Fang et al 2021;Xu et al 2021;Gu, Liu, and Tian 2021;.…”

Section: Introductionmentioning

confidence: 99%

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

Shi¹,

Zhang²,

Tang³

et al. 2021

Preprint

View full text Add to dashboard Cite

It is a consensus that small models perform quite poorly under the paradigm of self-supervised contrastive learning. Existing methods usually adopt a large off-the-shelf model to transfer knowledge to the small one via knowledge distillation. Despite their effectiveness, distillation-based methods may not be suitable for some resource-restricted scenarios due to the huge computational expenses of deploying a large model. In this paper, we study the issue of training self-supervised small models without distillation signals. We first evaluate the representation spaces of the small models and make two non-negligible observations: (i) small models can complete the pretext task without overfitting despite its limited capacity; (ii) small models universally suffer the problem of overclustering. Then we verify multiple assumptions that are considered to alleviate the over-clustering phenomenon. Finally, we combine the validated techniques and improve the baseline of five small architectures with considerable margins, which indicates that training small self-supervised contrastive models is feasible even without distillation signals.

show abstract

CompRess: Self-Supervised Learning by Compressing Representations

Cited by 2 publications

References 0 publications

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

Contact Info

Product

Resources

About