Depth Estimation and Semantic Segmentation from a Single RGB Image Using a Hybrid Convolutional Neural Network

Lin, Xiao; Sánchez-Escobedo, Dalila; Casas, Josep R.; Pardàs, Montse

doi:10.3390/s19081795

Cited by 28 publications

(31 citation statements)

References 34 publications

(82 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The thresholded accuracy means the ratio of the maximum relative error

below the threshold

. In our experiments, we use

to align the setting with previous studies [ 6 , 8 , 13 , 15 , 16 , 17 , 42 ].…”

Section: Methodsmentioning

confidence: 99%

“…Multi-task learning with the goal of learning multiple tasks simultaneously [ 40 ] is nowadays based on CNNs [ 13 , 15 , 16 , 17 , 18 , 19 , 20 , 41 , 42 , 43 , 44 , 44 ]. Eigen and Fergus proposed a network structure that can estimate the depth, semantic labels, and the surface orientation of a scene [ 13 ].…”

Section: Related Workmentioning

confidence: 99%

“…They also proposed an attention-driven loss to deal with long-tail data distribution. Lin et al [ 17 ] presented a hybrid CNN that combines a common feature extraction network and a global depth estimation network for understanding the global layout of the scene. Nekrasov et al [ 16 ] used knowledge distillation and achieved a real-time performance.…”

Section: Related Workmentioning

confidence: 99%

“…However, these 2D representations are limited to obtain the 2D contextual information in an image due to 2D CNNs. In contrast to existing approaches [ 15 , 16 , 17 ], we propose a 3D feature representation, namely, a latent 3D volume. This representation enhances the 3D contextual information by introducing 3D CNNs.…”

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image

Ito

Kaneko

Kim

2020

Sensors

View full text Add to dashboard Cite

This paper proposes a novel 3D representation, namely, a latent 3D volume, for joint depth estimation and semantic segmentation. Most previous studies encoded an input scene (typically given as a 2D image) into a set of feature vectors arranged over a 2D plane. However, considering the real world is three-dimensional, this 2D arrangement reduces one dimension and may limit the capacity of feature representation. In contrast, we examine the idea of arranging the feature vectors in 3D space rather than in a 2D plane. We refer to this 3D volumetric arrangement as a latent 3D volume. We will show that the latent 3D volume is beneficial to the tasks of depth estimation and semantic segmentation because these tasks require an understanding of the 3D structure of the scene. Our network first constructs an initial 3D volume using image features and then generates latent 3D volume by passing the initial 3D volume through several 3D convolutional layers. We apply depth regression and semantic segmentation by projecting the latent 3D volume onto a 2D plane. The evaluation results show that our method outperforms previous approaches on the NYU Depth v2 dataset.

show abstract

“…The thresholded accuracy means the ratio of the maximum relative error

below the threshold

. In our experiments, we use

to align the setting with previous studies [ 6 , 8 , 13 , 15 , 16 , 17 , 42 ].…”

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image

Ito

Kaneko

Kim

2020

Sensors

View full text Add to dashboard Cite

show abstract

“…Monocular Depth Estimation [22][23][24][25][26][27][28] infers depth information from single RGB images and is demonstrated to be an ill-posed problem(In most cases, there are several possible outputs corresponding to a given input image and the problem can be seen as a task of selecting the most proper one from all the possible outputs [29]). Stereo Depth Estimation [30][31][32] needs specific devices to capture stereo images and can provide much more clues to estimate the depth than Monocular Depth Estimation.…”

Section: Introductionmentioning

confidence: 99%

Multi-Scale Spatio-Temporal Feature Extraction and Depth Estimation from Sequences by Ordinal Classification

Liu

2020

Sensors

View full text Add to dashboard Cite

Depth estimation is a key problem in 3D computer vision and has a wide variety of applications. In this paper we explore whether deep learning network can predict depth map accurately by learning multi-scale spatio-temporal features from sequences and recasting the depth estimation from a regression task to an ordinal classification task. We design an encoder-decoder network with several multi-scale strategies to improve its performance and extract spatio-temporal features with ConvLSTM. The results of our experiments show that the proposed method has an improvement of almost 10% in error metrics and up to 2% in accuracy metrics. The results also tell us that extracting spatio-temporal features can dramatically improve the performance in depth estimation task. We consider to extend this work to a self-supervised manner to get rid of the dependence on large-scale labeled data.

show abstract

Depth-based adaptable image layer prediction using bidirectional depth semantic fusion

Lin,

Fan,

Huang

et al. 2024

Vis Comput

View full text Add to dashboard Cite

Depth Estimation and Semantic Segmentation from a Single RGB Image Using a Hybrid Convolutional Neural Network

Cited by 28 publications

References 34 publications

Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image

Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image

Multi-Scale Spatio-Temporal Feature Extraction and Depth Estimation from Sequences by Ordinal Classification

Depth-based adaptable image layer prediction using bidirectional depth semantic fusion

Contact Info

Product

Resources

About