2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2019
DOI: 10.1109/iros40897.2019.8967624
|View full text |Cite
|
Sign up to set email alerts
|

Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles

Abstract: We address the problem of 3D object detection from 2D monocular images in autonomous driving scenarios. We propose to lift the 2D images to 3D representations using learned neural networks and leverage existing networks working directly on 3D data to perform 3D object detection and localization. We show that, with carefully designed training mechanism and automatically selected minimally noisy data, such a method is not only feasible, but gives higher results than many methods working on actual 3D inputs acqui… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 43 publications
(14 citation statements)
references
References 31 publications
0
14
0
Order By: Relevance
“…In addition, we want to work towards a more complete system that allows SLAM to succeed in videos with more challenges, e.g. motion blurs and rolling shutter distortion, as well as explore SfM [45], 3D object localization [46] and top-view mapping [13] in unconstrained scenarios.…”
Section: Discussionmentioning
confidence: 99%
“…In addition, we want to work towards a more complete system that allows SLAM to succeed in videos with more challenges, e.g. motion blurs and rolling shutter distortion, as well as explore SfM [45], 3D object localization [46] and top-view mapping [13] in unconstrained scenarios.…”
Section: Discussionmentioning
confidence: 99%
“…A representative model is Mono3D (Chen et al, 2016), which uses additional instance and semantic segmentation along with further features to reason about the pose and location of objects in 3D space. Srivastava et al (2019) modify the generative adversarial network of BirdNet (Beltrán et al, 2018) to create a BEV projection from the monocular representation. All following operations such as feature extractions and predictions are then performed on this new representation.…”
Section: Informed Monocular Approachesmentioning
confidence: 99%
“…Robust models like VoxelNet, PointNet, and RoarNet were able to process 3D sensory data, combined video, and LiDAR information [30][31][32]. e popular methods of 3D detection can be roughly classified as Monocular Image-Based Methods, Monocular Image-Based Methods, and Fusion-Based Methods, which work by extrapolating 2D bounding boxes, generating the 3D representation of the point cloud fusing front view images and point clouds, respectively [29,33,34]. However, these methods are computationally expensive and require more time to execute as compared to 2D detectors.…”
Section: Object Detectionmentioning
confidence: 99%