2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00296
|View full text |Cite
|
Sign up to set email alerts
|

Real-Time Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer

Abstract: Monocular depth estimation using learning-based approaches has become promising in recent years. However, most monocular depth estimators either need to rely on large quantities of ground truth depth data, which is extremely expensive and difficult to obtain, or predict disparity as an intermediary step using a secondary supervisory signal leading to blurring and other artefacts. Training a depth estimation model using pixel-perfect synthetic data can resolve most of these issues but introduces the problem of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
180
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 261 publications
(181 citation statements)
references
References 62 publications
1
180
0
Order By: Relevance
“…However, supervised learning-based approaches rely on expensive ground-truth depth data for training and are not flexible to be deployed in novel environments. Even if synthetic data generation has been proposed to partially tackle this issue [26], the cost of synthesizing realistic data remains high.…”
Section: Supervised Depth Estimationmentioning
confidence: 99%
“…However, supervised learning-based approaches rely on expensive ground-truth depth data for training and are not flexible to be deployed in novel environments. Even if synthetic data generation has been proposed to partially tackle this issue [26], the cost of synthesizing realistic data remains high.…”
Section: Supervised Depth Estimationmentioning
confidence: 99%
“…The primary reason for using synthetic images [17] during training is that despite the increased depth density of the real-world imagery [54], depth information for the majority of the scene is still missing, leading to undesirable artefacts in regions where depth values are not available. A naïve solution would be to only use synthetic data to resolve the issue, but due to differences in the data domains, a model only trained on synthetic data cannot be expected to perform well on real-world images without domain adaptation [5,63]. Consequently, we opt for randomly sampling training images from both datasets to force the overall model to capture the underlying distribution of both data domains, and therefore, learn the full dense structure of a synthetic scene while simultaneously modelling the contextual complexity of the naturally-sensed real-world images.…”
Section: Proposed Approachmentioning
confidence: 99%
“…• A joint multi-task framework for depth prediction encouraging improved geometric and contextual learning to boost performance. • Monocular depth estimation via adversarial training, a deep architecture with skip connections and a robust compound objective function directly supervised using this framework to outperform prior contemporary work [5,7,14,20,31,36,62,66]. • Sparse to dense depth completion via the same multitask model, capable of generating a dense depth output given a sparse depth input captured via a LiDAR sensor with results superior to prior contemporary work [10,16,40,50,54].…”
Section: Introductionmentioning
confidence: 98%
See 1 more Smart Citation
“…Markov random field (MRF) [38] and Conditional random field (CRF) [31] can be applied to regress image depth against monocular images. More recent approaches use deep neural networks with multi-scale predictions [11,12], large-scale datasets [26,2] and user interactions [37]. Stereo provides strong cues for unsupervised learning [14,46] or semi-supervised learning with LiDAR [24].…”
Section: Related Workmentioning
confidence: 99%