URNet: An UNet-Based Model with Residual Mechanism for Monocular Depth Estimation

Duong, Hoang-Thanh; Chen, Hsi-Min; Chang, Che-Cheng

doi:10.3390/electronics12061450

Cited by 5 publications

(4 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This model achieves a good balance between accuracy and computation time. Duong et al [32] constructed a depth estimation model called UR-Net, which adds attention to the decoder and replaces the transmission block in the conventional U-Net with spatial pyramid pool blocks (ASPP).…”

Section: U-netmentioning

confidence: 99%

SAU-Net: Monocular Depth Estimation Combining Multi-Scale Features and Attention Mechanisms

Zhao,

Song,

Wang

2023

IEEE Access

View full text Add to dashboard Cite

Monocular depth estimation technology is widely utilized in autonomous driving for sensing and obstacle avoidance. Recent advancements in deep-learning techniques have resulted in significant progress in monocular depth estimation. However, monocular depth estimation is mainly optimized for the luminosity error of pixels, mostly disregarding the related problems of result ambiguity and boundary artifacts in the image. To address these issues, we developed an improved network model called SAU-Net. The superposition of excessive convolutional layers in conventional convolution networks impairs the network's timeliness and results in the loss of primary information. Therefore, we propose a convolutionfree stratified transformer as an image feature extractor at the network's coding end, which limits selfattention to innumerable windows and leverages sliding windows for characterization to reduce the network delay. This study also addresses the issue of critical information loss. We connect each feature map directly to another from a different scale. In addition, an attention module is introduced to focus on the effective features, which increases the amount of target information in the depth map. We employ the gradient loss function during the training stage to improve the segmentation accuracy of the network and the smoothness of the output image. Training and testing were conducted using the KITTI dataset. To ensure the robustness of the algorithm in practical applications, we also validated the algorithm using a campus dataset that we collected. The experimental results indicated that the accuracy of the algorithm was 89.1%, 96.4%, and 98.5% under three proportional thresholds. The estimated depth map was clear in details and edges, with fewer artifacts.INDEX TERMS autonomous driving, monocular depth estimation, SAU-net, stratified transformer.

show abstract

Section: U-netmentioning

confidence: 99%

SAU-Net: Monocular Depth Estimation Combining Multi-Scale Features and Attention Mechanisms

Zhao,

Song,

Wang

2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…He and his scholars do this by initially generating a rough prediction of deep information, and then using another neural network to refine it to produce more accurate results. Since then, a great deal of work has been done to improve the accuracy of supervised depth estimation of monocular images, including the addition of residual mechanism [16], the use of conditional random scene methods [17], the anti-Huber distance loss function method [18], the joint optimization of surface normals [19], the fusion of multiple depth maps [20], the add of an additional channel to the output layer [21] and the method of expressing it as an ordinal classification problem [22]. People even apply the method to microscopic scenes [23].…”

Section: Fusion Of Unsupervised Learning and Deep Estimation Algorithmsmentioning

confidence: 99%

Monocular Depth Estimation for 3D Map Construction at Underground Parking Structures

Song

Gao

et al. 2023

Electronics

View full text Add to dashboard Cite

Converting the actual scenes into three-dimensional models has inevitably become one of the fundamental requirements in autonomous driving. At present, the main obstacle to large-scale deployment is the high-cost lidar for environment sensing. Monocular depth estimation aims to predict the scene depth and construct a 3D map via merely a monocular camera. In this paper, we add geometric consistency constraints to address the non-Lambertian surface problems in depth estimation. We also utilize the imaging principles and conversion rules to produce a 3D scene model from multiple images. We built a prototype and conduct extensive experiments in a corridor and an underground parking structure, and the results show the effectiveness for indoor location-based services.

show abstract

“…Devido a sua arquitetura flexível e facilmente modificável, a U-Net tem sido uma arquitetura de CNN comumente adotada como base para a construc ¸ão de novas redes na tarefa de estimativa de profundidade [29,30,31]. Saxena et al [31] combinaram a U-Net, utilizando a EfficientNet como codificador, com a reconstruc ¸ão de mapas de profundidades ruidosos para melhorar significativamente a estimativa de profundidade.…”

Section: Introduc ¸ãOunclassified

Comparing U-Net based architectures in monocular depth estimation

Silva,

Gazolli

2023

Anais Do XVIII Workshop De Visão Computacional (WVC 2023)

View full text Add to dashboard Cite

Monocular depth estimation is a computer vision problem which has diverse applications ranging from augmented reality to surgical procedures. Given the similarity between the segmentation and monocular depth estimation tasks, in addition to the good performance of the U-net network and its variations in the segmentation task, this study aims to compare the performance of variations of U-Net and UNet++ architectures, each one adopting a different network as encoder, and the TransUnet architecture in monocular depth estimation. The results achieved on the NYU Depth V2 dataset shows that U-Net using Mix Transformer (MiT-B2) as encoder outperforms all other evaluated approaches.

show abstract

URNet: An UNet-Based Model with Residual Mechanism for Monocular Depth Estimation

Cited by 5 publications

References 52 publications

SAU-Net: Monocular Depth Estimation Combining Multi-Scale Features and Attention Mechanisms

SAU-Net: Monocular Depth Estimation Combining Multi-Scale Features and Attention Mechanisms

Monocular Depth Estimation for 3D Map Construction at Underground Parking Structures

Comparing U-Net based architectures in monocular depth estimation

Contact Info

Product

Resources

About