AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation

Kundu, Jogendra Nath; Uppala, Phani Krishna; Pahuja, Anuj; Babu, R. Venkatesh

doi:10.48550/arxiv.1803.01599

Cited by 1 publication

(2 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Coming to domain adaptation for depth estimation, Atapour et al [2] developed a two-stage method which first learned a image translator [47] to stylize the real images into synthetic images, and then trained a supervised depth estimation network using the original synthetic images. Kundu et al [20] proposed a content congruent regularization method to address the model collapse problem which usually happens in high-dimensional data. Recently, Zheng et al [45] developed an end-to-end adaptation network, i.e.…”

Section: Domain Adaptationmentioning

confidence: 99%

“…The ground truth depth in the synthetic data allows supervised training of depth estimation networks. Since the synthetic data have different characteristics than the real data, recent works [2,20,45] used domain mapping [48] to reduce the discrepancy between synthetic and real domains and obtained impressive depth estimation performance. However, translated images by current unsupervised domain mapping methods suffer from undesirable distortions, which undermines depth prediction accuracy.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning Depth from Monocular Videos Using Synthetic Data: A Temporally-Consistent Domain Adaptation Approach

Mou¹,

Gong²,

Fu³

et al. 2019

Preprint

View full text Add to dashboard Cite

Majority of state-of-the-art monocular depth estimation methods are supervised learning approaches. The success of such approaches heavily depends on the high-quality depth labels which are expensive to obtain. Recent methods try to learn depth networks by exploring unsupervised cues from monocular videos which are easier to acquire but less reliable. In this paper, we propose to resolve this dilemma by transferring knowledge from synthetic videos with easily obtainable ground truth depth labels. Due to the stylish difference between synthetic and real images, we propose a temporally-consistent domain adaptation (TCDA) approach that simultaneously explores labels in the synthetic domain and temporal constraints in the videos to improve style transfer and depth prediction. Furthermore, we make use of the ground truth optical flow and pose information in the synthetic data to learn moving mask and pose prediction networks. The learned moving masks can filter out moving regions that produces erroneous temporal constraints and the estimated poses provide better initializations for estimating temporal constraints. The experimental results demonstrate the effectiveness of our method and comparable performance against state-of-the-art.

show abstract