2019
DOI: 10.1007/978-3-030-11021-5_43
|View full text |Cite
|
Sign up to set email alerts
|

Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding

Abstract: Learning to estimate 3D geometry in a single image by watching unlabeled videos via deep convolutional network has made significant process recently. Current state-of-the-art (SOTA) methods, are based on the learning framework of rigid structure-from-motion, where only 3D camera ego motion is modeled for geometry estimation. However, moving objects also exist in many videos, e.g. moving cars in a street scene. In this paper, we tackle such motion by additionally incorporating per-pixel 3D object motion into th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
99
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 88 publications
(101 citation statements)
references
References 66 publications
2
99
0
Order By: Relevance
“…An estimate is considered as an error if it exceeds We propose four categories of state-of-the-art monocular baseline methods. In the first category are the multi-task networks, GeoNet [73], DF-Net [77] and EveryPixel [70]. These CNNs are trained in an unsupervised manner and are able to provide single-view depth estimates for both images and optical flow estimates.…”
Section: Monocular Scene Flowmentioning
confidence: 99%
See 1 more Smart Citation
“…An estimate is considered as an error if it exceeds We propose four categories of state-of-the-art monocular baseline methods. In the first category are the multi-task networks, GeoNet [73], DF-Net [77] and EveryPixel [70]. These CNNs are trained in an unsupervised manner and are able to provide single-view depth estimates for both images and optical flow estimates.…”
Section: Monocular Scene Flowmentioning
confidence: 99%
“…For the GeoNet and DF-Net, their published code and models are used. The results of the EveryPixel approach are stated in their paper [70] (D2 metric is excluded as it seems to be inconsistent). As a second category, single-view depth estimation ('LRC [19]' or 'DORN [14]') and optical flow estimation ('MirrorFlow [30] and 'HD 3 -F [72]' used parts of the dataset for training, these methods are disregarded for ranking.…”
Section: Monocular Scene Flowmentioning
confidence: 99%
“…Conversely, in this work, we focus on the binocular stereo depth estimation scenario where both images are available at test time. Other works considered jointly learning the scene depth and the ego-motion in monocular videos without using groundtruth data [13], [31], [32], [33], [34]. These works demonstrated that by integrating temporal information and considering multiple consecutive frames better estimates can be obtained.…”
Section: Unsupervised Depth Estimationmentioning
confidence: 99%
“…With the rapid development of deep convolutional neural networks (CNNs), the most recent approaches [14], [15], [16], [17] address the joint estimation of rigid and non-rigid geometry. They introduce an unsupervised learning framework that estimates depth, optical flow and ego-motion of a camera in a coupled way.…”
Section: Scene Flow Estimationmentioning
confidence: 99%
“…To tackle this problem, we design an unsupervised learning framework that includes depth, pose and residual flow networks as shown in Figure 1. In contrast to the previous monocular-based approaches [14], [15], [16], [17], we introduce a stereo-based motion and residual flow learning module to handle non-rigid cases caused by dynamic objects in an unsupervised manner. Employing a stereo-based system enables to solve the ambiguity of 3D object motion in the monocular setting, and to obtain reliable depth estimates that can facilitate the other tasks.…”
Section: Introductionmentioning
confidence: 99%