DTS-Depth: Real-Time Single-Image Depth Estimation Using Depth-to-Space Image Construction

Ibrahem, Hatem; Salem, Ahmed; Kang, Hyun‐Soo

doi:10.3390/s22051914

Cited by 5 publications

(4 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lee et al [ 13 ] proposed a CNN-based method namely From big to small (BTS) which utilizes local planar guidance layers at different scales in the decoder stage that guides the feature maps to accurate depth predictions. We also provided challenging depth estimation results in previous research [ 14 , 15 ] in which we eliminate the complexity of the decoder in the encoder-decoder CNN architecture using depth-to-space (pixel-shuffle) image reconstruction. Although the previously stated methods attained relatively good results, the estimated depth in most of the stated methods has blurry results especially at the borders of the objects in the scene due to the inefficient encoding and decoding stages due to the local learning scheme naturally provided by the convolution algorithm.…”

Section: Related Workmentioning

confidence: 99%

“…Depth estimation is a critical task in a variety of computer vision applications, including 3D scene reconstruction from 2D images, medical 3D imaging, augmented reality, self-driving cars and robots, and 3D computer graphics and animations. The recent advances in depth estimation research have shown the effectiveness of the convolutional neural networks (CNNs) in performing such a task [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 ]. The encoder-decoder CNN architectures are the most used architectures in the dense prediction tasks [ 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 ] (image-like predictions such as semantic segmentation and depth estimation).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers

Ibrahem

Salem

Kang

2022

Sensors

Self Cite

View full text Add to dashboard Cite

The latest research in computer vision highlighted the effectiveness of the vision transformers (ViT) in performing several computer vision tasks; they can efficiently understand and process the image globally unlike the convolution which processes the image locally. ViTs outperform the convolutional neural networks in terms of accuracy in many computer vision tasks but the speed of ViTs is still an issue, due to the excessive use of the transformer layers that include many fully connected layers. Therefore, we propose a real-time ViT-based monocular depth estimation (depth estimation from single RGB image) method with encoder-decoder architectures for indoor and outdoor scenes. This main architecture of the proposed method consists of a vision transformer encoder and a convolutional neural network decoder. We started by training the base vision transformer (ViT-b16) with 12 transformer layers then we reduced the transformer layers to six layers, namely ViT-s16 (the Small ViT) and four layers, namely ViT-t16 (the Tiny ViT) to obtain real-time processing. We also try four different configurations of the CNN decoder network. The proposed architectures can learn the task of depth estimation efficiently and can produce more accurate depth predictions than the fully convolutional-based methods taking advantage of the multi-head self-attention module. We train the proposed encoder-decoder architecture end-to-end on the challenging NYU-depthV2 and CITYSCAPES benchmarks then we evaluate the trained models on the validation and test sets of the same benchmarks showing that it outperforms many state-of-the-art methods on depth estimation while performing the task in real-time (∼20 fps). We also present a fast 3D reconstruction (∼17 fps) experiment based on the depth estimated from our method which is considered a real-world application of our method.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers

Ibrahem

Salem

Kang

2022

Sensors

Self Cite

View full text Add to dashboard Cite

show abstract

“…The DS module was employed in recent CNN-based depth methods [13][14] but the SD module was not presented as a down-sampling technique like we propose in this research. The suggested architecture outperforms state-of-the-art (SOTA) approaches for depth estimation, despite being simpler and less sophisticated than the SOTA methods.…”

Section: Related Workmentioning

confidence: 99%

SD-Depth: Light-Weight Monocular Depth Estimation Using Space Depth CNN for Real-Time Applications

Ibrahem

Salem

Kang

2022

Machine Learning and Artificial Intelligence

View full text Add to dashboard Cite

With the help of the space-to-depth and depth-to-space modules, we provide a convolutional neural network design for depth estimation. We show designs that down sample the spatial information of the picture utilizing space-to-depth (SD) as opposed to the widely used pooling methods (Max-pooling and Average-pooling). The space-to-depth module may shrink the image while maintaining the spatial information of the image in the form of additional depth information. This technique is far superior to Max-pooling, which diminishes the image’s information and features. We also suggest a lightweight decoder step that builds a high-resolution depth map out of many low-resolution feature maps using the depth-to-space (DS) module. The suggested architecture effectively learns depth estimation with high processing speed and accuracy. We trained and evaluated our suggested model on NYU-depthV2 dataset and attained low error values (RMSE=0.342) and high delta accuracies (δ3=0.996) at a fast-processing speed (25Fps).

show abstract

“…Moreover, depth estimation is one of the effective methods that is utilized in several applications such as 3D imaging and scanning, background removal and separation, and 3D object rendering. Recently, depth estimation methods are proposed using effectiveness of modern convolutional neural networks (CNNs) [14,15].…”

Section: Introductionmentioning

confidence: 99%

Simplified digital content generation using single-shot depth estimation for full-color holographic printing system

Khuderchuluun,

Darkhanbaatar,

Kwon

et al. 2024

Practical Holography XXXVIII: Displays, Materials, and Applications

View full text Add to dashboard Cite

In this paper, simplified digital content generation using single-shot depth estimation for full-color holographic printing system is proposed. Firstly, digital content generation is analyzed completely before the hardware system of holographic printing is run to provide a high-quality three-dimensional (3D) scene without degrading information of the original 3D object. Here, the single-shot depth estimation method is applied, and 3D information is acquired from the estimated highquality depth data and a given single 2D image. Then the array of sub-holograms (hogels) is generated directly by implementing fully analyzed computation considering chromatic aberration for full-color printing. Finally, the generated hogels are recorded into holographic material sequentially via effectual time-controlled exposure under synchronized control with three electrical shutters for RGB laser beam illuminations to obtain full-color 3D reconstruction. Numerical simulation and optical reconstructions are implemented successfully.

show abstract

DTS-Depth: Real-Time Single-Image Depth Estimation Using Depth-to-Space Image Construction

Cited by 5 publications

References 40 publications

RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers

RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers

SD-Depth: Light-Weight Monocular Depth Estimation Using Space Depth CNN for Real-Time Applications

Simplified digital content generation using single-shot depth estimation for full-color holographic printing system

Contact Info

Product

Resources

About