MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion of LiDAR Data

Laddha, Ankit; Gautam, Shivam; Palombo, Stefan; Pandey, Shreyash; Vallespi-Gonzalez, Carlos

doi:10.1109/cvprw53098.2021.00321

Cited by 25 publications

(15 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Forecasting: Most sensor fusion works consider perception tasks, e.g. object detection [14]- [16], [18]- [23], [47]- [60] and motion forecasting [24]- [30], [49], [61], [62]. They operate on multi-view LiDAR, e.g.…”

Section: Sensor Fusion Methods For Object Detection and Motionmentioning

confidence: 99%

TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving

Chitta¹,

Prakash²,

Jaeger³

et al. 2022

Preprint

View full text Add to dashboard Cite

How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise for perception (e.g. object detection, motion forecasting). However, in the context of end-to-end driving, we find that imitation learning based on existing sensor fusion methods underperforms in complex driving scenarios with a high density of dynamic agents. Therefore, we propose TransFuser, a mechanism to integrate image and LiDAR representations using self-attention. Our approach uses transformer modules at multiple resolutions to fuse perspective view and bird's eye view feature maps. We experimentally validate its efficacy on a challenging new benchmark with long routes and dense traffic, as well as the official leaderboard of the CARLA urban driving simulator. At the time of submission, TransFuser outperforms all prior work on the CARLA leaderboard in terms of driving score by a large margin. Compared to geometry-based fusion, TransFuser reduces the average collisions per kilometer by 48%.

show abstract

Section: Sensor Fusion Methods For Object Detection and Motionmentioning

confidence: 99%

TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving

Chitta¹,

Prakash²,

Jaeger³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…An enhancement model to MultiXNet [64] is proposed by Fadadu et al [38]. The model in [38] Lastly, MVFuseNet [88] implements perception and motion forecasting by fusing sequential LIDAR data in both BEV and RV forms, in addition to HD map features. Unlike [38], MVFuseNet performs spatio-temporal fusion of both BEV and RV features for multiple frames.…”

Section: Predictions Using Fusion Of Lidar and Camera Sensorsmentioning

confidence: 99%

“…MVFuseNet reports improved performance over [38] in perception and motion prediction across all object categories. However, again, 3D object-level predictions were computed in both [38,88]. It is evident that no work has yet been conducted that investigates pixel-wise joint perception and motion prediction using multi-modal fusion, which is essential for small and distant objects as they provide fine-grained, pixel level precision.…”

Section: Predictions Using Fusion Of Lidar and Camera Sensorsmentioning

confidence: 99%

“…It is evident that no work has yet been conducted that investigates pixel-wise joint perception and motion prediction using multi-modal fusion, which is essential for small and distant objects as they provide fine-grained, pixel level precision. The datasets used to evaluate the experiments in [38,88] are the public nuScenes and internal private datasets.…”

Section: Predictions Using Fusion Of Lidar and Camera Sensorsmentioning

confidence: 99%

See 1 more Smart Citation

Exploiting Multi-Modal Fusion for Urban Autonomous Driving Using Latent Deep Reinforcement Learning

Khalil

Mouftah

2023

IEEE Trans. Veh. Technol.

View full text Add to dashboard Cite

show abstract

“…A line of works [4,18] realize multi-view fusion either by aggregating features to refine proposals or fusing features in the region constrained by the spatial projection. [7,17] fuse the ROI features from point cloud and camera image for proposals refinement.…”

Section: Multi-view 3d Detectionmentioning

confidence: 99%

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention

Deng¹,

Liang²,

Sun³

et al. 2022

Preprint

View full text Add to dashboard Cite

Detecting objects from LiDAR point clouds is of tremendous significance in autonomous driving. In spite of good progress, accurate and reliable 3D detection is yet to be achieved due to the sparsity and irregularity of LiDAR point clouds. Among existing strategies, multi-view methods have shown great promise by leveraging the more comprehensive information from both bird's eye view (BEV) and range view (RV). These multi-view methods either refine the proposals predicted from single view via fused features, or fuse the features without considering the global spatial context; their performance is limited consequently. In this paper, we propose to adaptively fuse multi-view features in a global spatial context via Dual Cross-VIew SpaTial Attention (VISTA). The proposed VISTA is a novel plug-and-play fusion module, wherein the multi-layer perceptron widely adopted in standard attention modules is replaced with a convolutional one. Thanks to the learned attention mechanism, VISTA can produce fused features of high quality for prediction of proposals. We decouple the classification and regression tasks in VISTA, and an additional constraint of attention variance is applied that enables the attention module to focus on specific targets instead of generic points. We conduct thorough experiments on the benchmarks of nuScenes and Waymo; results confirm the efficacy of our designs. At the time of submission, our method achieves 63.0% in overall mAP and 69.8% in NDS on the nuScenes benchmark, outperforming all published methods by up to 24% in safety-crucial categories such as cyclist. Code.

show abstract

MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion of LiDAR Data

Cited by 25 publications

References 25 publications

TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving

TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving

Exploiting Multi-Modal Fusion for Urban Autonomous Driving Using Latent Deep Reinforcement Learning

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention

Contact Info

Product

Resources

About