Masked Autoencoders in Computer Vision: A Comprehensive Survey

Zhou, Zexian; Liu, Xiaojing

doi:10.1109/access.2023.3323383

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Other2

Article1

Relationship

Self Cite0

Independent3

Authors

Journals

Cited by 3 publications

References 114 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

BiMAE - A Bimodal Masked Autoencoder Architecture for Single-Label Hyperspectral Image Classification

Kukushkin,

Bogdan,

Schmid

2024

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

BiMAE - A Bimodal Masked Autoencoder Architecture for Single-Label Hyperspectral Image Classification

Kukushkin,

Bogdan,

Schmid

2024

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

XoFTR: Cross-modal Feature Matching Transformer

Tuzcuoğlu,

Köksal,

Sofu

et al. 2024

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

MAPM:PolSAR Image Classification with Masked Autoencoder Based on Position Prediction and Memory Tokens

Wang,

Li,

Quan

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

Deep learning methods have shown significant advantages in polarimetric synthetic aperture radar (PolSAR) image classification. However, their performances rely on a large number of labeled data. To alleviate this problem, this paper proposes a PolSAR image classification method with a Masked Autoencoder based on Position prediction and Memory tokens (MAPM). First, MAPM designs a Masked Autoencoder (MAE) based on the transformer for pre-training, which can boost feature learning and improve classification results based on the number of labeled samples. Secondly, since the transformer is relatively insensitive to the order of the input tokens, a position prediction strategy is introduced in the encoder part of the MAE. It can effectively capture subtle differences and discriminate complex, blurry boundaries in PolSAR images. In the fine-tuning stage, the addition of learnable memory tokens can improve classification performance. In addition, L1 loss is used for MAE optimization to enhance the robustness of the model to outliers in PolSAR data. Experimental results show the effectiveness and advantages of the proposed MAPM in PolSAR image classification. Specifically, MAPM achieves performance gains of about 1% in classification accuracy compared with existing methods.

show abstract

Masked Autoencoders in Computer Vision: A Comprehensive Survey

Cited by 3 publications

References 114 publications

BiMAE - A Bimodal Masked Autoencoder Architecture for Single-Label Hyperspectral Image Classification

BiMAE - A Bimodal Masked Autoencoder Architecture for Single-Label Hyperspectral Image Classification

XoFTR: Cross-modal Feature Matching Transformer

MAPM:PolSAR Image Classification with Masked Autoencoder Based on Position Prediction and Memory Tokens

Contact Info

Product

Resources

About