IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium 2022
DOI: 10.1109/igarss46834.2022.9883983
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised Vision Transformers for Joint SAR-Optical Representation Learning

Abstract: Self-supervised learning (SSL) has attracted much interest in remote sensing and Earth observation due to its ability to learn task-agnostic representations without human annotation. While most of the existing SSL works in remote sensing utilize ConvNet backbones and focus on a single modality, we explore the potential of vision transformers (ViTs) for joint SAR-optical representation learning. Based on DINO, a state-of-the-art SSL algorithm that distills knowledge from two augmented views of an input image, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 26 publications
(15 citation statements)
references
References 13 publications
0
15
0
Order By: Relevance
“…Multi-modal/temporal self-supervised learning. As one of the most important characteristics of remote sensing data, multi-modality is a significant aspect to be explored in selfsupervised representation learning [214,220,151]. On the other hand, multi-temporal image analysis is raising more interest because of the increasing frequency of data acquisition and transferring.…”
Section: B Benchmark Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Multi-modal/temporal self-supervised learning. As one of the most important characteristics of remote sensing data, multi-modality is a significant aspect to be explored in selfsupervised representation learning [214,220,151]. On the other hand, multi-temporal image analysis is raising more interest because of the increasing frequency of data acquisition and transferring.…”
Section: B Benchmark Resultsmentioning
confidence: 99%
“…Jain et al [219] perform pre-training by contrasting SAR and optical images using BYOL. Wang et al [220] proposed DINO-MM, a joint SAR-optical representation learning approach with vision transformers. Following the main self-supervised mechanism as DINO, the authors introduced RandomSensorDrop to let the model see all possible combinations of both modalities during training.…”
Section: Infoncementioning
confidence: 99%
See 1 more Smart Citation
“…Continuing such an approach of joint learning using optical and SAR images, the authors in [96] explore the potential of the non-contrastive method DINO [47] with backbones based on transformer architectures instead of CNNs. As described in Section 2.2.4, DINO is an SSL paradigm that involves a student network trained to extract consistent predictions against a teacher network whose weights are defined as the EMA of the student weights.…”
Section: Non-contrastivementioning
confidence: 99%
“…Wang et al 2022 [96] Non-contrastive BigEarthNet-MM Explore DINO paradigm with transformer backbones for optical-SAR joint representation learning in remote sensing. Propose an augmentation which randomly masks either multi-spectral or polarimetric SAR channels.…”
Section: Jain Et Al 2022 [95]mentioning
confidence: 99%