2021
DOI: 10.48550/arxiv.2104.03775
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Geometry-based Distance Decomposition for Monocular 3D Object Detection

Abstract: Monocular 3D object detection is of great significance for autonomous driving but remains challenging. The core challenge is to predict the distance of objects in the absence of explicit depth information. Unlike regressing the distance as a single variable in most existing methods, we propose a novel geometry-based distance decomposition to recover the distance by its factors. The decomposition factors the distance of objects into the most representative and stable variables, i.e. the physical height and the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(12 citation statements)
references
References 31 publications
0
12
0
Order By: Relevance
“…The learning procedure is thus called auxiliary learning, in contrast to multitask learning, for which all tasks in training will be of interest in inference too. Auxiliary tasks and auxiliary learning have shown many successful applications in computer vision (Zhang et al 2014;Mordan et al 2018;Liu, Davison, and Johns 2019;Ye et al 2021;Valada, Radwan, and Burgard 2018). Although simple, exploiting comprehensive 2D auxiliary tasks has not been studied well in monocular 3D object detection.…”
Section: Related Work and Our Contributionsmentioning
confidence: 99%
See 1 more Smart Citation
“…The learning procedure is thus called auxiliary learning, in contrast to multitask learning, for which all tasks in training will be of interest in inference too. Auxiliary tasks and auxiliary learning have shown many successful applications in computer vision (Zhang et al 2014;Mordan et al 2018;Liu, Davison, and Johns 2019;Ye et al 2021;Valada, Radwan, and Burgard 2018). Although simple, exploiting comprehensive 2D auxiliary tasks has not been studied well in monocular 3D object detection.…”
Section: Related Work and Our Contributionsmentioning
confidence: 99%
“…3D object detection is a critical component in many computer vision applications in practice, such as autonomous driving and robot navigation. High performing methods often require more costly system setups such as Lidar sensors (Yan, Mao, and Li 2018;Lang et al 2019;Qi et al 2019; for precise depth measurements or stereo cameras (Li, Chen, and Shen 2019;Qin, Wang, and Lu 2019;Xu 30 35 3D Detection Performance (APR40|IoU≥0.7) MonoRCNN (Shi et al 2021) DDMP3D (Wang et al 2021a) CaDDN (Reading et al 2021) MonoEF (Zhou et al 2021) MonoFlex (Zhang, Lu, and Zhou 2021) GUPNet (Lu et al 2021) MonoCon (Ours) Sun et al 2020) for stereo depth estimation, and are often more computationally expensive. To alleviate those "burden" and due to the potential prospects of reduced cost and increased modular redundancy, monocular 3D object detection that aims to localize 3D object bounding boxes from an input 2D image has emerged as a promising alternative approach with much attention received in the computer vision and AI community (Chen et al 2016;Manhardt, Kehl, and Gaidon 2019;Simonelli et al 2019;Brazil and Liu 2019;Wang et al 2020;Ye et al 2020;Shi, Chen, and Kim 2020;Luo et al 2021;Kumar, Brazil, and Liu 2021;Wang et al 2021c,d).…”
Section: Introductionmentioning
confidence: 99%
“…For 2D representation-based 3D detector, an intuitive solution is to leverage a 2D object detector [3], [4]. Similar to 2D object detectors, prior work [8], [9], [11], [14], [15], [16], [56], [57], [58], [59] directly estimates 3D bounding boxes from camera images and relies on perspective modeling of the 2D projected object and its 3D objects. On the other hand, depth supervision via point clouds or depth maps is accessible during training.…”
Section: Lidar-based 3d Detectionmentioning
confidence: 99%
“…cos(δ), are zero, which could stop backpropagation learning at the extreme opposite output. To resolve this problem, researchers [50,51,52,53,54,55,56,57,58,59,60,61] performed regression on both the cosine and sine of the angle (cos(θ), sin(θ)). When converting any given 1D Scalar-Based Orientation θ into 2-Dimensional values, they can be mapped to 2-D Cartesian system with angle's cosine value projected to x-axis and sine value projected to y-axis.…”
Section: Alpha (Local/allocentric Rotation)mentioning
confidence: 99%