2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00575
|View full text |Cite
|
Sign up to set email alerts
|

Learning Single-Image Depth From Videos Using Quality Assessment Networks

Abstract: Depth estimation from a single image in the wild remains a challenging problem. One main obstacle is the lack of high-quality training data for images in the wild. In this paper we propose a method to automatically generate such data through Structure-from-Motion (SfM) on Internet videos. The core of this method is a Quality Assessment Network that identifies high-quality reconstructions obtained from SfM. Using this method, we collect singleview depth training data from a large number of YouTube videos and co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
69
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 71 publications
(70 citation statements)
references
References 60 publications
(95 reference statements)
1
69
0
Order By: Relevance
“…they are frozen in action while the camera moves through the scene). Chen et al [39] propose an approach to automatically assess the quality of sparse SfM reconstructions in order to construct a large dataset. Wang et al [33] build a large dataset from stereo videos sourced from the web, while Cho et al [40] collect a dataset of outdoor scenes with handheld stereo cameras.…”
Section: Related Workmentioning
confidence: 99%
“…they are frozen in action while the camera moves through the scene). Chen et al [39] propose an approach to automatically assess the quality of sparse SfM reconstructions in order to construct a large dataset. Wang et al [33] build a large dataset from stereo videos sourced from the web, while Cho et al [40] collect a dataset of outdoor scenes with handheld stereo cameras.…”
Section: Related Workmentioning
confidence: 99%
“…Quantitative and qualitative results: In Table 4 , we compare our MDP model with 7 state-of-the-art methods, including DIW [ 41 ], DL [ 5 ], RW [ 15 ], MD [ 42 ], Y3D [ 44 ], MC [ 43 ], and HRWSI [ 14 ]. For the definition of the metrics, please refer to [ 14 ].…”
Section: Methodsmentioning
confidence: 99%
“…In other words, these methods trained on one dataset often fail to get promising predictions on a different one. To learn depth in general scenes with a single model, recent studies [ 15 , 41 , 42 , 43 , 44 ] start from constructing in-the-wild RGB-D datasets. For example, Chen et al [ 41 ] propose the DIW dataset, which consists of about 495K natural images.…”
Section: Related Workmentioning
confidence: 99%
“…Extracting a 3D model from photos could be achieved using a single image or multiple images covering different viewpoints. 11–16,34–36 The standard depth and 3D prediction methods assume that light moves in straight lines, 10,37 an assumption that fails with transparent objects. 1–3 As a result, standard methods for extracting the 3D models from images like stereo matching and structured light fail on transparent objects.…”
Section: Related Workmentioning
confidence: 99%