2021
DOI: 10.1609/aaai.v35i3.16282
|View full text |Cite
|
Sign up to set email alerts
|

Patch-Wise Attention Network for Monocular Depth Estimation

Abstract: In computer vision, monocular depth estimation is the problem of obtaining a high-quality depth map from a two-dimensional image. This map provides information on three-dimensional scene geometry, which is necessary for various applications in academia and industry, such as robotics and autonomous driving. Recent studies based on convolutional neural networks achieved impressive results for this task. However, most previous studies did not consider the relationships between the neighboring pixels in a local ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 50 publications
(10 citation statements)
references
References 34 publications
0
10
0
Order By: Relevance
“…The best results are highlighted in bold. [14], evaluation range 0-80m Eigen et al [13] 0-80m 0.203 1.548 6.307 0.282 0.702 0.898 0.967 DORN [15] 0-80m 0.072 0.307 2.727 0.120 0.932 0.984 0.994 VNL [61] 0-80m 0.072 -3.258 0.117 0.938 0.990 0.998 BTS [30] 0-80m 0.061 0.261 2.834 0.099 0.954 0.992 0.998 PWA [31] 0-80m 0.060 0.221 2.604 0.093 0.958 0.994 0.999 TransDepth [60] 0-80m 0.064 0.252 2.755 0.098 0.956 0.994 0.999 Adabins [5] 0-80m 0.058 0.190 2.360 0.088 0.964 0.995 0.999 P3Depth [41] 0-80m 0.071 0.270 2.842 0.103 0.953 0.993 0.998 DepthFormer [32] 0-80m 0.052 0.158 2.143 0.079 0.975 0.997 0.999 NeWCRFs [63] 0-80m 0.052 0.155 2.129 0.079 0.974 0.997 0.999 PixelFormer [1] 0-80m 0.051 0.149 2.081 0.077 0.976 0.997 0.999 BinsFormer [33] 0-80m 0.052 0.151 2.098 0.079 0.974 0.997 0.999 VA-Depth [35] 0-80m 0.050 -2.090 0.079 0.977 0.997 -URCDC-Depth [47] 0-80m 0.050 0.142 2.032 0.076 0.977 0.997 0.999 DiffusionDepth (ours) 0-80m 0.050 0.141 2.016 0.074 0.977 0.998 0.999 Official Offline Split [16], evaluation range 0-50m BTS [30] 0-50m 0.058 0.183 1.995 0.090 0.962 0.994 0.999 PWA [31] 0-50m 0.057 0.161 1.872 0.087 0.965 0.995 0.999 TransDepth [60] 0-50m 0.061 0.185 1.992 0.091 0.963 0.995 0.999 P3Depth [41] 0-50m 0.055 0.130 1.651 0.081 0.974 0.997 0.999 URCDC-Depth [47] 0-50m 0.049 0.108 1.528 0.072 0.981 0.998 1.000 DiffusionDepth (ours) 0-50m 0.041 0.103 1.418 0.069 0.986 0.999 1.000…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The best results are highlighted in bold. [14], evaluation range 0-80m Eigen et al [13] 0-80m 0.203 1.548 6.307 0.282 0.702 0.898 0.967 DORN [15] 0-80m 0.072 0.307 2.727 0.120 0.932 0.984 0.994 VNL [61] 0-80m 0.072 -3.258 0.117 0.938 0.990 0.998 BTS [30] 0-80m 0.061 0.261 2.834 0.099 0.954 0.992 0.998 PWA [31] 0-80m 0.060 0.221 2.604 0.093 0.958 0.994 0.999 TransDepth [60] 0-80m 0.064 0.252 2.755 0.098 0.956 0.994 0.999 Adabins [5] 0-80m 0.058 0.190 2.360 0.088 0.964 0.995 0.999 P3Depth [41] 0-80m 0.071 0.270 2.842 0.103 0.953 0.993 0.998 DepthFormer [32] 0-80m 0.052 0.158 2.143 0.079 0.975 0.997 0.999 NeWCRFs [63] 0-80m 0.052 0.155 2.129 0.079 0.974 0.997 0.999 PixelFormer [1] 0-80m 0.051 0.149 2.081 0.077 0.976 0.997 0.999 BinsFormer [33] 0-80m 0.052 0.151 2.098 0.079 0.974 0.997 0.999 VA-Depth [35] 0-80m 0.050 -2.090 0.079 0.977 0.997 -URCDC-Depth [47] 0-80m 0.050 0.142 2.032 0.076 0.977 0.997 0.999 DiffusionDepth (ours) 0-80m 0.050 0.141 2.016 0.074 0.977 0.998 0.999 Official Offline Split [16], evaluation range 0-50m BTS [30] 0-50m 0.058 0.183 1.995 0.090 0.962 0.994 0.999 PWA [31] 0-50m 0.057 0.161 1.872 0.087 0.965 0.995 0.999 TransDepth [60] 0-50m 0.061 0.185 1.992 0.091 0.963 0.995 0.999 P3Depth [41] 0-50m 0.055 0.130 1.651 0.081 0.974 0.997 0.999 URCDC-Depth [47] 0-50m 0.049 0.108 1.528 0.072 0.981 0.998 1.000 DiffusionDepth (ours) 0-50m 0.041 0.103 1.418 0.069 0.986 0.999 1.000…”
Section: Methodsmentioning
confidence: 99%
“…NYU-Depth-v2 dataset is collected from indoor scenes at a resolution of 640 × 480 pixels [37] and dense depth GT (density > 95%). Following prior works, we adopt the official split and the dataset processed by Lee et al [31], which contains 24231 training images and 654 testing images.…”
Section: Methodsmentioning
confidence: 99%
“…We test several values of attention strides to explore their impact. While certain models have applied a stride value of 2 (Aich et al, 2021) (Lee et al, 2021), our training finds optimal results with a stride value of 3 similar to (Li et al, 2021) (Xu et al, 2018) based on the limitation of a single GPU (experiments performed on the NVIDIA RTX TITAN GPU with 24GB GPU memory) where reducing the attention stride further would squeeze the crop size of the data and further harm the performance. The results of the attention stride about disparity change task are shown in Table 3.…”
Section: Memory-feasible Implementation About Sttr3dmentioning
confidence: 99%
“…DiffusionDepth [25] employed hierarchical aggregation and heterogeneous interaction to enhance the feature information across scales. Sihaeng et al [26] proposed a Partitioned Attention Module to fuse spatial and channel information for improved depth-detail representation. Some researchers have introduced transformers to leverage global information effectively.…”
Section: Monocular Depth Estimation Based On Local and Global Informa...mentioning
confidence: 99%