Radar Voxel Fusion for 3D Object Detection

Nobis, Felix; Shafiei, Ehsan; Karle, Phillip; Betz, Johannes; Lienkamp, Markus

doi:10.3390/app11125598

Cited by 39 publications

(13 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This operation causes information loss. However, works like [84] directly fused camera, LiDAR, and radar inputs without initial processing.…”

Section: A Data (Early) Fusionmentioning

confidence: 99%

“…Method Dataset Input Fusion Type mAP-BEV/NDS (%) mAP-3D/mAP (%) MV3D [4] KITTI [32] RGB image& LiDAR early, middle, late --PointFusion [85] KITTI [32] RGB image& LiDAR early -40.13 AVOD-FPN [5] KITTI [32] RGB image &LiDAR middle 64.03 55.63 SAANET [111] KITTI [32] RGB imag & LiDAR middle -52.5 3D-CVF [58] KITTI [32] RGB image &LiDAR middle (gated based) --CenterFusion [109] nuScenes [30] RGB image & radar middle 45.30 33.20 RVF-Net [84] nuScenes [30] RGB image, radar, & LiDAR early 54.86 -FusionNet [96] custom [96] RGB image & radar early -73.5 Meyer et al [82] Astyx [40] RGB image & radar late -48.0 VPFNet [124] KITTI [32] RGB If we have a deeper network, we lose much information at each layer because of this operation. Coming up with a new idea to replace a pooling operation with an equivalent operation that does not cause information loss or build a reconstruction /upsampling layer that can fully reconstruct the lost features with no loss, such as using wavelets, may help to increase the detection performance.…”

Section: Other Fusion Techniquesmentioning

confidence: 99%

“…The output on the dataset shows that radar and camera fusion perform better than LiDAR and camera fusion. Nobis et al[84] proposed a LiDAR, camera, and radar fusion model, RadarVoxelFusionNet (RVF-Net), for 3D object detection. The LiDAR data points are projected into the image space and fused with camera images to simulate the depth camera and generate 3D points.…”

mentioning

confidence: 99%

See 2 more Smart Citations

A Comprehensive Survey of Deep Learning Multisensor Fusion-based 3D Object Detection for Autonomous Driving: Methods, Challenges, Open Issues, and Future Directions

Alaba¹

2022

Preprint

View full text Add to dashboard Cite

<p>Autonomous driving requires accurate, robust, and fast decision-making perception systems to understand the driving environment. Object detection is critical in allowing the perception system to understand the environment. The perception systems, especially 2D object detection and classification, have succeeded because of the emergence of deep learning (DL) in computer vision (CV) applications. However, 2D object detection lacks depth information, which is crucial to understanding the driving environment. Therefore, 3D object detection is fundamental for the perception system of autonomous driving and robotics applications to estimate the objects’ location and understand the driving environment. The CV community has been giving much attention recently to 3D object detection because of the growth of DL models and the need to know accurate locations of objects.</p> <p>However, 3D object detection is still challenging because of scale changes, the lack of 3D sensor information, and occlusions. Researchers have been using multiple sensors to solve these problems and further enhance the performance of the perception system. This survey presents the multisensor (camera, radar, and LiDAR) fusion-based 3D object detection methods. The fully autonomous vehicles need to be equipped with multiple sensors for robust and reliable driving. Camera, LiDAR, and radar sensors and their corresponding advantages and disadvantages are also presented. Then, relevant datasets are summarized, and state-of-the-art multisensor fusion-based methods are reviewed. Finally, challenges, open issues, and possible research directions are presented.</p>

show abstract

“…This operation causes information loss. However, works like [84] directly fused camera, LiDAR, and radar inputs without initial processing.…”

Section: A Data (Early) Fusionmentioning

confidence: 99%

Section: Other Fusion Techniquesmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

A Comprehensive Survey of Deep Learning Multisensor Fusion-based 3D Object Detection for Autonomous Driving: Methods, Challenges, Open Issues, and Future Directions

Alaba¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…These point cloud based networks can be further differentiated into grid-based and point-based architectures. Grid-based approaches first render the point cloud into a 2D bird eye view (BEV) or 3D voxel grid using hand-crafted operations [11], [28], [29], [30], [31] or learned feature-encoders [32], [12], [31] and subsequently apply convolutional backbones to the grid.…”

Section: A Radar Object Detectionmentioning

confidence: 99%

Self-Supervised Velocity Estimation for Automotive Radar Object Detection Networks

Niederlöhner,

Ulrich,

Braun

et al. 2022

Preprint

View full text Add to dashboard Cite

“…In the field of target detection [35][36][37], recall and precision are mainly used as the performance measure of the algorithm. Precision (P) and recall (R) are, respectively, defined as follows: R = TP TP + FN ,…”

Section: Algorithm Performance Evaluationmentioning

confidence: 99%

MANet: End-to-End Learning for Point Cloud Based on Robust Pointpillar and Multiattention

Huo

Shi

Yang

et al. 2022

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

Detecting 3D objects in a crowd remains a challenging problem since the cars and pedestrians often gather together and occlude each other in the real world. The Pointpillar is the leader in 3D object detection, its detection process is simple, and the detection speed is fast. Due to the use of maxpooling in the Voxel Feature Encode (VFE) stage to extract global features, the fine-grained features will disappear, resulting in insufficient feature expression ability in the feature pyramid network (FPN) stage, so the object detection of small targets is not accurate enough. This paper proposes to improve the detection effect of networks in complex environments by integrating attention mechanisms and the Pointpillar. In the VFE stage of the model, the mixedattention module (HA) was added to retain the spatial structure information of the point cloud to the greatest extent from the three perspectives: local space, global space, and points. The Convolutional Block Attention Module (CBAM) was embedded in FPN to mine the deep information of pseudoimages. The experiments based on the KITTI dataset demonstrated our method had better performance than other state-of-the-art single-stage algorithms. Compared with another model, in crowd scenes, the mean average precision (mAP) under the bird's-eye view (BEV) detection benchmark increased from 59.20% of Pointpillar and 66.19% of TANet to 69.91 of ours, the mAP under the 3D detection benchmark was increased from 62% of TANet to 65.11% of ours, and the detection speed only dropped from 13.1 fps of Pointpillar to 12.8 fps of ours.

show abstract

Radar Voxel Fusion for 3D Object Detection

Cited by 39 publications

References 51 publications

A Comprehensive Survey of Deep Learning Multisensor Fusion-based 3D Object Detection for Autonomous Driving: Methods, Challenges, Open Issues, and Future Directions

A Comprehensive Survey of Deep Learning Multisensor Fusion-based 3D Object Detection for Autonomous Driving: Methods, Challenges, Open Issues, and Future Directions

Self-Supervised Velocity Estimation for Automotive Radar Object Detection Networks

MANet: End-to-End Learning for Point Cloud Based on Robust Pointpillar and Multiattention

Contact Info

Product

Resources

About