2021
DOI: 10.48550/arxiv.2111.14819
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

Abstract: We present Point-BERT, a new paradigm for learning Transformers to generalize the concept of BERT [8] to 3D point cloud. Inspired by BERT, we devise a Masked Point Modeling (MPM) task to pre-train point cloud Transformers. Specifically, we first divide a point cloud into several local point patches, and a point cloud Tokenizer with a discrete Variational AutoEncoder (dVAE) is designed to generate discrete point tokens containing meaningful local information. Then, we randomly mask out some patches of input po… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
36
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(36 citation statements)
references
References 62 publications
0
36
0
Order By: Relevance
“…The series of GPT [37,38,5] and BERT [12] apply masked modeling to natural language processing and achieve extraordinary performance boost on downstream tasks with fine-tuning. Inspired by this, BEiT [4] proposes to match image patches with discrete tokens via dVAE [39] and pre-train a standard vision transformer [14,59] by masked image modeling. On top of that, MAE [20] directly reconstructs the raw pixel values of masked tokens and performs great efficiency with a high mask ratio.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…The series of GPT [37,38,5] and BERT [12] apply masked modeling to natural language processing and achieve extraordinary performance boost on downstream tasks with fine-tuning. Inspired by this, BEiT [4] proposes to match image patches with discrete tokens via dVAE [39] and pre-train a standard vision transformer [14,59] by masked image modeling. On top of that, MAE [20] directly reconstructs the raw pixel values of masked tokens and performs great efficiency with a high mask ratio.…”
Section: Related Workmentioning
confidence: 99%
“…For self-supervised pre-training on 3D point clouds, the masked autoencoding has not been widely adopted. Similar to BEiT, Point-BERT [59] utilizes dVAE to map 3D patches to tokens for masked point modeling, but heavily relies on constrastive learning [21], complicated data augmentation, and the costly two-stage pre-training. In contrast, our Point-M2AE is a pure masked autoencoding method of one-stage pre-training, and follows MAE to reconstruct the input signals without dVAE mapping.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Since our main goal is not to develop a general backbone for point clouds, we simply work with a standard transformer for shape autoencoding (first-stage training). Similarly, both PointBERT [52] and PointMAE [28] use standard transformers for point cloud self-supervised learning.…”
Section: Neural Shape Representationsmentioning
confidence: 99%