2023
DOI: 10.48550/arxiv.2303.11325
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding

Abstract: Multi-view camera-based 3D detection is a challenging problem in computer vision. Recent works leverage a pretrained LiDAR detection model to transfer knowledge to a camera-based student network. However, we argue that there is a major domain gap between the LiDAR BEV features and the camera-based BEV features, as they have different characteristics and are derived from different sources. In this paper, we propose Geometry Enhanced Masked Image Modeling (GeoMIM) to transfer the knowledge of the LiDAR model in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 38 publications
0
1
0
Order By: Relevance
“…Recently 3D object detection from surround-view images has attracted much attention and achieved great progress, due to its advantages of low deployment cost and rich semantic information. Based on feature representation, existing methods (Wang et al 2021Liu et al 2022a;Huang and Huang 2022;Jiang et al 2023;Liu et al 2022b;Li et al 2022c;Yang et al 2023;Park et al 2022;Wang et al 2023a;Zong et al 2023;Liu et al 2023) can be largely classified into BEV-based methods and sparse-query based methods.…”
Section: Surround-view 3d Object Detectionmentioning
confidence: 99%
“…Recently 3D object detection from surround-view images has attracted much attention and achieved great progress, due to its advantages of low deployment cost and rich semantic information. Based on feature representation, existing methods (Wang et al 2021Liu et al 2022a;Huang and Huang 2022;Jiang et al 2023;Liu et al 2022b;Li et al 2022c;Yang et al 2023;Park et al 2022;Wang et al 2023a;Zong et al 2023;Liu et al 2023) can be largely classified into BEV-based methods and sparse-query based methods.…”
Section: Surround-view 3d Object Detectionmentioning
confidence: 99%