Depth Estimation from Monocular Image and Coarse Depth Points based on Conditional GAN

Li, Yaoxin; Qian, Keyuan; Zhou, Jingkun

doi:10.1051/matecconf/201817503055

Cited by 8 publications

(12 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Multimodal data-based depth estimation commonly uses inputs containing two or three modalities of data [7], [8], [35]- [37]. The method [7] converted depth estimation into distance prediction between reference and true depth maps, performing more effectively than the depth prediction [35].…”

Section: Related Workmentioning

confidence: 99%

“…Wang et al [36] inferred depth by iteratively changing intermediate representation in pre-trained depth estimation models. Li et al [37] employed depth samples and RGB images to estimate depth. Sparse depth-based depth prediction also used deep learning models [3], [8], [38] to predict depth.…”

Section: Related Workmentioning

confidence: 99%

“…1) A few valid depth points are uniformly and randomly sampled from depth images on training and testing datasets, respectively. For a fair comparison, the number of valid depth samples is set to the same as that of research [8], [37].…”

Section: Three Modalities Of Inputsmentioning

confidence: 99%

“…The probability of depth samples at each position is about identical in a depth map because we sample depth points uniformly and randomly. The number of valid depth samples is set to the same as that of work [8], [37], [38] for a fair comparison. In each depth image, the largest number of valid depth samples is 200, which accounts for 0.04% and 0.06% of the total on the KITTI and NYU-Depth-v2 images, respectively.…”

Section: Three Modalities Of Inputsmentioning

confidence: 99%

“…To evaluate DEM quantitatively, we use the error metrics which are also adopted by studies [1], [4]- [6], [7], [8], [15], [16], [24]- [26], [27], [28], [35], [37]- [39], [43]. The error metrics consider global statistics between a ground-truth depth image Y containing N depth pixels and a corresponding predicted depth map P consisting of N depth pixels.…”

Section: Error Metricsmentioning

confidence: 99%

See 4 more Smart Citations

Learning Depth for Scene Reconstruction Using an Encoder-Decoder Model

Liu

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Depth estimation has received considerable attention and is often applied to visual simultaneous localization and mapping (SLAM) for scene reconstruction. At least to our knowledge, sufficiently reliable depth always fails to be provided for monocular depth estimation-based SLAM because new image features are rarely re-exploited effectively, local features are easily lost, and relative depth relationships among depth pixels are readily ignored in previous depth estimation methods. Based on inaccurate monocular depth estimation, SLAM still faces scale ambiguity problems. To accurately achieve scene reconstruction based on monocular depth estimation, this paper makes three contributions. (1) We design a depth estimation model (DEM), consisting of a precise encoder to re-exploit new features and a decoder to learn local features effectively. (2) We propose a loss function using the depth relationship of pixels to guide the training of DEM. (3) We design a modular SLAM system containing DEM, feature detection, descriptor computation, feature matching, pose prediction, keyframe extraction, loop closure detection, and pose-graph optimization for pixel-level scene reconstruction. Extensive experiments demonstrate that the DEM and DEM-based SLAM are effective. (1) Our DEM predicts more reliable depth than the state of the arts when inputs are RGB images, sparse depth, or the fusion of both on public datasets. (2) The DEM-based SLAM system achieves comparable accuracy as compared with well-known modular SLAM systems. INDEX TERMS Convolutional neural networks, depth estimation, decoder, encoder, simultaneous localization and mapping.

show abstract