2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00549
|View full text |Cite
|
Sign up to set email alerts
|

Joint Monocular 3D Vehicle Detection and Tracking

Abstract: Vehicle 3D extents and trajectories are critical cues for predicting the future location of vehicles and planning future agent ego-motion based on those predictions. In this paper, we propose a novel online framework for 3D vehicle detection and tracking from monocular videos. The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform. Our method leverages 3D box depth-o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
133
0
1

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 228 publications
(134 citation statements)
references
References 43 publications
0
133
0
1
Order By: Relevance
“…Following methods continued to leverage Convolutional Neural Networks and focused only on Car instances. To regress 3D pose parameters from 2D detections, Deep3DBox [38], MonoGRnet [46], and Hu et al [23] used geometrical reasoning for 3D localization, while Multi-fusion [57] and ROI-10D [35] incorporated a module for depth estimation. Recently, Roddick et al [48] escaped the image domain by mapping image-based features into a birds-eye view representation using integral images.…”
Section: Related Workmentioning
confidence: 99%
“…Following methods continued to leverage Convolutional Neural Networks and focused only on Car instances. To regress 3D pose parameters from 2D detections, Deep3DBox [38], MonoGRnet [46], and Hu et al [23] used geometrical reasoning for 3D localization, while Multi-fusion [57] and ROI-10D [35] incorporated a module for depth estimation. Recently, Roddick et al [48] escaped the image domain by mapping image-based features into a birds-eye view representation using integral images.…”
Section: Related Workmentioning
confidence: 99%
“…Other solutions that provide detection and tracking rely on neural networks for the detection part, as well as for the tracking. In [21], the authors provide a method for tracking using Long Short Term Memory (LSTM) neural networks, but the main disadvantage is that it heavily depends on datasets and the availability of training data. The authors mention that the tracking is trained using imagery from realistic video games (synthetic data).…”
Section: Related Workmentioning
confidence: 99%
“…The solution from [21] employs a CNN for candidate region extraction, then uses another CNN for orientation and size estimation and, in the end, makes use of the LSTM neural network to track the detections. The main drawback is that it heavily relies on training data, the authors even mention that they used extensive synthetic images during the training and development of the approach.…”
Section: Comparison With Other Obstacle Detection Techniquesmentioning
confidence: 99%
“…I N the last decade, the widespread use of visual traffic surveillance systems has led to the rapid growth of video data that need to be processed. With the increasing amount of available video data, computer vision technology has been widely used in the field of intelligent transportation [1]- [3]. Vehicle fine-grained recognition has attracted more and more research interest [4]- [9], and its main goal is to identify the detailed information of a vehicle including make, model, submodel, etc.…”
Section: Introductionmentioning
confidence: 99%