2021 International Conference on 3D Vision (3DV) 2021
DOI: 10.1109/3dv53792.2021.00092
|View full text |Cite
|
Sign up to set email alerts
|

Attention meets Geometry: Geometry Guided Spatial-Temporal Attention for Consistent Self-Supervised Monocular Depth Estimation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 24 publications
(5 citation statements)
references
References 39 publications
0
5
0
Order By: Relevance
“…In 2021, Gao et al proposed a weakly supervised MDE method utilizing a commonly used U-net architecture [31]. Both encoder and decoder are constructed based on VGG blocks and channel self-attention modules to recalibrate the channel-wise information.…”
Section: Supervised Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…In 2021, Gao et al proposed a weakly supervised MDE method utilizing a commonly used U-net architecture [31]. Both encoder and decoder are constructed based on VGG blocks and channel self-attention modules to recalibrate the channel-wise information.…”
Section: Supervised Trainingmentioning
confidence: 99%
“…The low-resolution disparity is then processed by a multi-scale decoder to yield highresolution disparity. A temporally consistent depth (TC-Depth) prediction method was proposed by Ruhkamp et al [31], in which a spatial-temporal attention block is placed at the bottleneck to aggregate geometric and sequential consistencies. Unlike those of SE and CBAM, TC depth adopts physical distance between points of 3D space to measure the spatial attention map defined as follows:…”
Section: Self-supervised Trainingmentioning
confidence: 99%
“…In addition, several studies improve performance by adding additional semantic information or motion information learning networks [41]- [44]. In addition, multi-frame-based evaluation is performed to further utilize geometric information [2], [37], [42], [45], [46].…”
Section: B Self-supervised Monocular Depth Estimationmentioning
confidence: 99%
“…3D geometric constraints [27,6,45] were proposed that penalize the euclidean distances between the reconstructed point clouds of two consecutive frames, after transforming one to the other. Similar geometric constraints on the depth maps were proposed [1,25,35] which minimize the inconsistency of the estimated disparity maps of two consecutive frames, after warping one onto the other. Our work lies in this category and differs in the fact that we propose complementary constraints on the pose, which can, in principle, be added and used together with the other temporal constraints.…”
Section: Self-supervised Temporal Consistencymentioning
confidence: 99%
“…Ruhkamp et al . [35] propose a strategy to detect and mask inconsistent regions such as occlusions, in neighboring depth maps via a cycle consistency of the reprojected RGB frames.…”
Section: Similar Constraints In Literaturementioning
confidence: 99%