2022
DOI: 10.48550/arxiv.2205.14401
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training

Abstract: Masked Autoencoders (MAE) have shown great potentials in self-supervised pretraining for language and 2D image transformers. However, it still remains an open question on how to exploit masked autoencoding for learning 3D representations of irregular point clouds. In this paper, we propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds. Unlike the standard transformer in MAE, we modify the encoder and decoder into pyramid architectures t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 12 publications
(12 citation statements)
references
References 42 publications
0
12
0
Order By: Relevance
“…Masked Image Modeling Inspired by BERT [11] for Masked Language Modeling, Masked Image Modeling (MIM) becomes a popular pretext task for visual representation learning [6,14,2,46,1,4,51,3,49]. MIM aims to reconstruct the masked tokens from a corrupted input.…”
Section: Related Workmentioning
confidence: 99%
“…Masked Image Modeling Inspired by BERT [11] for Masked Language Modeling, Masked Image Modeling (MIM) becomes a popular pretext task for visual representation learning [6,14,2,46,1,4,51,3,49]. MIM aims to reconstruct the masked tokens from a corrupted input.…”
Section: Related Workmentioning
confidence: 99%
“…Annotating point clouds demands significant effort, necessitating self-supervised pre-training methods. Prior approaches primarily focus on object CAD models [21,26,29,39,42,44] and indoor scenes [17,35,46]. Point-BERT [42] applies BERT-like paradigms for point cloud recognition, while Point-MAE [26] reconstructs point patches without the tokenizer.…”
Section: Related Workmentioning
confidence: 99%
“…Following masked autoencoder (MAE) [20], Point-MAE [37] reconstructs the coordinates of masked points. Point-M2AE [62] extends the MAE pipeline to hierarchical multi-scale networks. Mask-Point [28] models an implicit representation to avoid information leakage.…”
Section: Related Workmentioning
confidence: 99%