2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.700
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Learning of Depth and Ego-Motion from Video

Abstract: We present an unsupervised learning framework for the task of monocular depth and camera motion estimation from unstructured video sequences. In common with recent work [10,14,16], we use an end-to-end learning approach with view synthesis as the supervisory signal. In contrast to the previous work, our method is completely unsupervised, requiring only monocular video sequences for training. Our method uses single-view depth and multiview pose networks, with a loss based on warping nearby views to the target u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

11
3,327
1
13

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
2
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 2,501 publications
(3,352 citation statements)
references
References 55 publications
11
3,327
1
13
Order By: Relevance
“…All the networks are jointly optimized during training, and then they can be applied independently at test time. For instance, [20] learns depth and ego-motion from monocular video in an unsupervised way. The CNNs is trained by a photometric reconstruction loss, which was obtained by warping nearby views to the target using the computed depth and pose.…”
Section: Unsupervised Learning Methodsmentioning
confidence: 99%
“…All the networks are jointly optimized during training, and then they can be applied independently at test time. For instance, [20] learns depth and ego-motion from monocular video in an unsupervised way. The CNNs is trained by a photometric reconstruction loss, which was obtained by warping nearby views to the target using the computed depth and pose.…”
Section: Unsupervised Learning Methodsmentioning
confidence: 99%
“…We also consider an implementation of a real dataset for fine tuning, using UAVs footages and either a preliminary thorough 3D offline scan or groundtruth-less techniques (Zhou et al, 2017). This would allow us to measure quantitative quality of our network for real footages and not only subjective as for now.…”
Section: Discussionmentioning
confidence: 99%
“…Godard et al [14] added the Left-Right consistency constraint to the loss function, exploiting another geometrical cue. Zhou et al [43] learned, in addition the ego-motion of the scene, and GeoNet [41] also used the optical flow of the scene. Wang et al [37] recently showed that using direct visual odometry along with depth normalization substantially improves performance on prediction.…”
Section: Related Workmentioning
confidence: 99%