TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

Bai, Xuyang; Hu, Zeyu; Zhu, Xinge; Huang, Qingqiu; Chen, Yilun; Fu, Hengzhi; Tai, Chiew-Lan

doi:10.48550/arxiv.2203.11496

Cited by 5 publications

(27 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Besides this, the current surge in research on computer vision with transformers can contribute to improve its performance in the future, which, in turn, will improve our tracker. One example for ongoing research in the area of transformer-based object detection is the very recent publication [29], which reaches state of the art results for lidar-based object detection and can be used as a building block within our proposed tracking model.…”

Section: Discussionmentioning

confidence: 99%

Transformers for Multi-Object Tracking on Point Clouds

Ruppel,

Faion,

Gläser

et al. 2022

Preprint

View full text Add to dashboard Cite

We present TransMOT, a novel transformer-based end-to-end trainable online tracker and detector for point cloud data. The model utilizes a cross-and a self-attention mechanism and is applicable to lidar data in an automotive context, as well as other data types, such as radar. Both track management and the detection of new tracks are performed by the same transformer decoder module and the tracker state is encoded in feature space. With this approach, we make use of the rich latent space of the detector for tracking rather than relying on low-dimensional bounding boxes. Still, we are able to retain some of the desirable properties of traditional Kalman-filter based approaches, such as an ability to handle sensor input at arbitrary timesteps or to compensate frame skips. This is possible due to a novel module that transforms the track information from one frame to the next on featurelevel and thereby fulfills a similar task as the prediction step of a Kalman filter. Results are presented on the challenging realworld dataset nuScenes, where the proposed model outperforms its Kalman filter-based tracking baseline.

show abstract

Section: Discussionmentioning

confidence: 99%

Transformers for Multi-Object Tracking on Point Clouds

Ruppel,

Faion,

Gläser

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…We investigate and evaluate existing popular LiDAR-camera fusion methods with opening source code on our benchmark, including PointAugmenting [34], MVX-Net [31], and TransFusion [1]. In addition, we also evaluate a LiDAR-only method, CenterPoint [20], and a camera-only method, DETR3D [38], for better comparison.…”

Section: Benchmark Existing Methodsmentioning

confidence: 99%

“…Specifically, these methods rely on the LiDAR-to-world and camera-toworld calibration matrix to project a LiDAR point on the image plane, where it serves as a query of image features [33,34,31,40,8,44]. Deep fusion methods extract deep features from some pre-trained neural networks for both modalities under a unified space [1,12,9,4,45,16,15], where a popular choice of such space is the bird's eye view (BEV) [1,45]. While both early and deep fusion mechanisms usually occur within a neural network pipeline, the late fusion scheme usually contains two independent perception models to generate 3D bounding box predictions for both modalities, then fuse these predictions using post-processing techniques [4,21].…”

Section: Related Workmentioning

confidence: 99%

“…TransFusion [1] evaluates the robustness of different fusion strategies under three scenarios: splitting validation set into daytime and nighttime, randomly dropping images for each frame, misaligning LiDAR and camera calibration by randomly adding a translation offset to the transformation matrix from camera to LiDAR sensor. However, TransFusion [1] mainly explores the robustness against camera inputs, and ignores the noisy LiDAR and temporal misalignment cases. DeepFusion [12] examines the model robustness by adding noise to LiDAR reflections and camera pixels.…”

Section: Related Workmentioning

confidence: 99%

“…To remove the randomness of benchmark comparison, we compose a toolkit that can transform any autonomous driving dataset into a robustness benchmark 1 . In essence, we only simulate noisy data cases by altering the image and LiDAR data, the ground-truth annotation will remain the same as the 3D position of the object in the surrounding worlds will not change when the sensors malfunction.…”

Section: A Toolkit To Transform Generic Autonomous Driving Dataset In...mentioning

confidence: 99%

See 2 more Smart Citations

Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection

Yu¹,

Tang²,

Xie³

et al. 2022

Preprint

View full text Add to dashboard Cite

There are two critical sensors for 3D perception in autonomous driving, the camera and the LiDAR. The camera provides rich semantic information such as color, texture, and the LiDAR reflects the 3D shape and locations of surrounding objects. People discover that fusing these two modalities can significantly boost the performance of 3D perception models as each modality has complementary information to the other. However, we observe that current datasets are captured from expensive vehicles that are explicitly designed for data collection purposes, and cannot truly reflect the realistic data distribution due to various reasons. To this end, we collect a series of real-world cases with noisy data distribution, and systematically formulate a robustness benchmark toolkit, that simulates these cases on any clean autonomous driving datasets. We showcase the effectiveness of our toolkit by establishing the robustness benchmark on two widely-adopted autonomous driving datasets, nuScenes and Waymo, then, to the best of our knowledge, holistically benchmark the state-of-the-art fusion methods for the first time. We observe that: i) most fusion methods, when solely developed on these data, tend to fail inevitably when there is a disruption to the LiDAR input; ii) the improvement of the camera input is significantly inferior to the LiDAR one. We further propose an efficient robust training strategy to improve the robustness of the current fusion method. The benchmark and code are available at https://github.com/kcyu2014/lidar-camerarobust-benchmark.

show abstract

BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework

Liang¹,

Xie²,

Yu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Fusing the camera and LiDAR information has become a de-facto standard for 3D object detection tasks. Current methods rely on point clouds from the LiDAR sensor as queries to leverage the feature from the image space. However, people discover that this underlying assumption makes the current fusion framework infeasible to produce any prediction when there is a LiDAR malfunction, regardless of minor or major. This fundamentally limits the deployment capability to realistic autonomous driving scenarios. In contrast, we propose a surprisingly simple yet novel fusion framework, dubbed BEVFusion, whose camera stream does not depend on the input of LiDAR data, thus addressing the downside of previous methods. We empirically show that our framework surpasses the state-of-the-art methods under the normal training settings. Under the robustness training settings that simulate various LiDAR malfunctions, our framework significantly surpasses the state-of-the-art methods by 15.7% to 28.9% mAP. To the best of our knowledge, we are the first to handle realistic LiDAR malfunction and can be deployed to realistic scenarios without any post-processing procedure. The code is available at https://github.com/ADLab-AutoDrive/BEVFusion. Recently, people have designed LiDAR-camera fusion deep networks to better leverage information from both modalities. Specifically, the majority of works can be summarized as follow: i) given one or a few points of the LiDAR point cloud, LiDAR to world transformation matrix and the essential matrix (camera to world); ii) people transform the LiDAR points [41,44,45,44,16,57] or proposals § Corresponding Author.* Equal Contribution.Preprint. Under review.

show abstract

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

Cited by 5 publications

References 0 publications

Transformers for Multi-Object Tracking on Point Clouds

Transformers for Multi-Object Tracking on Point Clouds

Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection

BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework

Contact Info

Product

Resources

About