DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras and Radars

Drews, Florian; Feng, Dan; Faion, Florian; Rosenbaum, Lars; Ulrich, Michael; Gläser, Claudius

doi:10.1109/iros47612.2022.9981778

Cited by 19 publications

(2 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…RadarNet [22] combines LiDAR and radar in a voxel-based approach for bird's-eye-view detection. Other researchers have combined image, radar, and LiDAR modalities [23,24], even including temporal information [25], to obtain dense data.…”

Section: Three-dimensional Radar Perceptionmentioning

confidence: 99%

Sparsity-Robust Feature Fusion for Vulnerable Road-User Detection with 4D Radar

Ruddat,

Reichardt,

Ebert

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

Detecting vulnerable road users is a major challenge for autonomous vehicles due to their small size. Various sensor modalities have been investigated, including mono or stereo cameras and 3D LiDAR sensors, which are limited by environmental conditions and hardware costs. Radar sensors are a low-cost and robust option, with high-resolution 4D radar sensors being suitable for advanced detection tasks. However, they involve challenges such as few and irregularly distributed measurement points and disturbing artifacts. Learning-based approaches utilizing pillar-based networks show potential in overcoming these challenges. However, the severe sparsity of radar data makes detecting small objects with only a few points difficult. We extend a pillar network with our novel Sparsity-Robust Feature Fusion (SRFF) neck, which combines high- and low-level multi-resolution features through a lightweight attention mechanism. While low-level features aid in better localization, high-level features allow for better classification. As sparse input data are propagated through a network, the increasing effective receptive field leads to feature maps of different sparsities. The combination of features with different sparsities improves the robustness of the network for classes with few points.

show abstract

Section: Three-dimensional Radar Perceptionmentioning

confidence: 99%

Sparsity-Robust Feature Fusion for Vulnerable Road-User Detection with 4D Radar

Ruddat,

Reichardt,

Ebert

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Currently, there are three main approaches for multi-modal fusion: early fusion (datalevel) methods [9], intermediate fusion (feature-level) methods [10][11][12][13], and late fusion (decision-level) methods [14,15], as shown in Figure 1.…”

Section: Introductionmentioning

confidence: 99%

ConCs-Fusion: A Context Clustering-Based Radar and Camera Fusion for Three-Dimensional Object Detection

He,

Deng,

et al. 2023

Remote Sensing

View full text Add to dashboard Cite

Multi-modality three-dimensional (3D) object detection is a crucial technology for the safe and effective operation of environment perception systems in autonomous driving. In this study, we propose a method called context clustering-based radar and camera fusion for 3D object detection (ConCs-Fusion) that combines radar and camera sensors at the intermediate fusion level to achieve 3D object detection. We extract features from heterogeneous sensors and input them as feature point sets into the fusion module. Within the fusion module, we utilize context cluster blocks to learn multi-scale features of radar point clouds and images, followed by upsampling and fusion of the feature maps. Then, we leverage a multi-layer perceptron to nonlinearly represent the fused features, reducing the feature dimensionality to improve model inference speed. Within the context cluster block, we aggregate feature points of the same object from different sensors into one cluster based on their similarity. All feature points within the same cluster are then fused into a radar–camera feature fusion point, which is self-adaptively reassigned to the originally extracted feature points from a simplex sensor. Compared to previous methods that only utilize radar as an auxiliary sensor to camera, or vice versa, the ConCs-Fusion method achieves a bidirectional cross-modal fusion between radar and camera. Finally, our extensive experiments on the nuScenes dataset demonstrate that ConCs-Fusion outperforms other methods in terms of 3D object detection performance.

show abstract