2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00043
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction

Abstract: Despite learning based methods showing promising results in single view depth estimation and visual odometry, most existing approaches treat the tasks in a supervised manner. Recent approaches to single view depth estimation explore the possibility of learning without full supervision via minimizing photometric error. In this paper, we explore the use of stereo sequences for learning depth and visual odometry. The use of stereo sequences enables the use of both spatial (between left-right pairs) and temporal (… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
496
0
2

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 642 publications
(499 citation statements)
references
References 41 publications
1
496
0
2
Order By: Relevance
“…However, learning to estimate depth purely from video breaks the static scene assumption and necessitates the use of an attention mechanism for foreground motion between consecutive frames. More recent iterations of this direction added scale normalization and removal of the separate pose estimation branch [40], 3D geometric constraints between the pre-dicted depths [23], epipolar constraints [29], additional feature reconstruction supervision [44], stereo matching constraints [43] or explicitly used two consecutive frames as input [28].…”
Section: Related Workmentioning
confidence: 99%
“…However, learning to estimate depth purely from video breaks the static scene assumption and necessitates the use of an attention mechanism for foreground motion between consecutive frames. More recent iterations of this direction added scale normalization and removal of the separate pose estimation branch [40], 3D geometric constraints between the pre-dicted depths [23], epipolar constraints [29], additional feature reconstruction supervision [44], stereo matching constraints [43] or explicitly used two consecutive frames as input [28].…”
Section: Related Workmentioning
confidence: 99%
“…Depth estimation from a single image has gained increasing attention in the computer vision community. Most works like [37,38,20,15,39,41,9,16] are proposed for indoor and outdoor scenes. We focus on depth estimation of humans, which allows us to build much stronger shape prior than these generic depth estimation methods.…”
Section: Related Workmentioning
confidence: 99%
“…Leveraging temporal stereo sequences for unsupervised monocular depth and pose estimation, e.g. by warping deep features, improves the accuracy of both tasks [55]. With the same result, Zou et al [60] jointly train for optical flow, pose and depth estimation simultaneously while Jiao et al [23] mutually improve semantics and depth and GeoNet [53] jointly estimates depth, optical flow and camera pose from video.…”
Section: Monocular Visionmentioning
confidence: 96%