2022
DOI: 10.48550/arxiv.2203.11496
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(27 citation statements)
references
References 0 publications
0
27
0
Order By: Relevance
“…Besides this, the current surge in research on computer vision with transformers can contribute to improve its performance in the future, which, in turn, will improve our tracker. One example for ongoing research in the area of transformer-based object detection is the very recent publication [29], which reaches state of the art results for lidar-based object detection and can be used as a building block within our proposed tracking model.…”
Section: Discussionmentioning
confidence: 99%
“…Besides this, the current surge in research on computer vision with transformers can contribute to improve its performance in the future, which, in turn, will improve our tracker. One example for ongoing research in the area of transformer-based object detection is the very recent publication [29], which reaches state of the art results for lidar-based object detection and can be used as a building block within our proposed tracking model.…”
Section: Discussionmentioning
confidence: 99%
“…We investigate and evaluate existing popular LiDAR-camera fusion methods with opening source code on our benchmark, including PointAugmenting [34], MVX-Net [31], and TransFusion [1]. In addition, we also evaluate a LiDAR-only method, CenterPoint [20], and a camera-only method, DETR3D [38], for better comparison.…”
Section: Benchmark Existing Methodsmentioning
confidence: 99%
“…Specifically, these methods rely on the LiDAR-to-world and camera-toworld calibration matrix to project a LiDAR point on the image plane, where it serves as a query of image features [33,34,31,40,8,44]. Deep fusion methods extract deep features from some pre-trained neural networks for both modalities under a unified space [1,12,9,4,45,16,15], where a popular choice of such space is the bird's eye view (BEV) [1,45]. While both early and deep fusion mechanisms usually occur within a neural network pipeline, the late fusion scheme usually contains two independent perception models to generate 3D bounding box predictions for both modalities, then fuse these predictions using post-processing techniques [4,21].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations