2022
DOI: 10.48550/arxiv.2206.01204
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Siamese Image Modeling for Self-Supervised Vision Representation Learning

Abstract: Self-supervised learning (SSL) has delivered superior performance on a variety of downstream vision tasks. Two main-stream SSL frameworks have been proposed, i.e., Instance Discrimination (ID) and Masked Image Modeling (MIM). ID pulls together the representations of different views from the same image, while avoiding feature collapse. It does well on linear probing but is inferior in detection performance. On the other hand, MIM reconstructs the original content given a masked image. It excels at dense predict… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(11 citation statements)
references
References 32 publications
0
11
0
Order By: Relevance
“…A study of the Variance-Invariance-Covariance pattern for the output of each block is conducted in order to better understand the Layer Grafted Pretraining. As illustrated in Figure 6, we find that the VIC pattern of Layer Grafted Pre-training tends 83.2 SIM (Tao et al, 2022) 83.8 ConMIM (Yi et al, 2022) 83.7 Layer Grafted Pre-training (Ours) 83.9…”
Section: More Ablationsmentioning
confidence: 86%
See 1 more Smart Citation
“…A study of the Variance-Invariance-Covariance pattern for the output of each block is conducted in order to better understand the Layer Grafted Pretraining. As illustrated in Figure 6, we find that the VIC pattern of Layer Grafted Pre-training tends 83.2 SIM (Tao et al, 2022) 83.8 ConMIM (Yi et al, 2022) 83.7 Layer Grafted Pre-training (Ours) 83.9…”
Section: More Ablationsmentioning
confidence: 86%
“…We start by verifying the effectiveness of the proposed Layer Grafted Pre-training by comparing it with state-of-the-art methods. As shown in (Zhou et al, 2021) 76.0 --SIM (Tao et al, 2022) 76.4 65.3 -C-MAE 73.9 65.3 77.3 MimCo 70 shows better intra-variance: the red (•), green (•) and yellow (•) points of Moco V3 collapse to a smaller region than the proposed Layer Grafted Pre-training.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 94%
“…Recent work attempted to combine the power of contrastive learning and masked image modelling. SIM (Tao et al, 2022) incorporated masking as part of the data augmentation operations into the contrastive learning framework. iBOT (Zhou et al, 2021) adopted a siamese network structure as contrastive learning and minimize the distance between the masked branch and the unmasked branch.…”
Section: Related Workmentioning
confidence: 99%
“…We fine-tuned the pre-trained models (already finetuned on ImageNet-1K) for 160K steps on ADE20K. The pre-trained model is delivered using ViT-Large MAE as the teacher and (Tao et al, 2022).…”
Section: Transfer Learning On Downstream Taskmentioning
confidence: 99%
“…For instance, propose a combination of contrastive and masked reconstruction objectives using one masked view, and one full (unmasked) view. Other recent works (Tao et al, 2022;Assran et al, 2022) use similar asymmetric designs. The key distinction between CAN and concurrent work is that we strike a different balance between simplicity, efficiency, and performance: we focus on developing a simple, efficient and symmetric method: we use two masked views and no momentum encoder.…”
Section: Concurrent Workmentioning
confidence: 99%