2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.01264
|View full text |Cite
|
Sign up to set email alerts
|

Towards Interpretable Deep Networks for Monocular Depth Estimation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 31 publications
0
4
0
Order By: Relevance
“…Hu et al attempted to determine the most relevant sparse pixels for depth estimation [17]. You et al first determined the depth selectivity of some hidden units, which showed good insights for interpreting monocular depth estimation models [18]. Zhi et al proposed a novel disentangled latent Transformer model based on the multi-level interpretation [19].…”
Section: Pixel-wise Dense Predictionmentioning
confidence: 99%
“…Hu et al attempted to determine the most relevant sparse pixels for depth estimation [17]. You et al first determined the depth selectivity of some hidden units, which showed good insights for interpreting monocular depth estimation models [18]. Zhi et al proposed a novel disentangled latent Transformer model based on the multi-level interpretation [19].…”
Section: Pixel-wise Dense Predictionmentioning
confidence: 99%
“…The lack of interpretability hinders the application of monocular depth estimation to downstream tasks (e.g., autonomous driving) [42]. Most existing solutions devote to achieving more advanced performance but ignore the model's interpretability.…”
Section: Interpretable Analysismentioning
confidence: 99%
“…Or they just backward inference how the model works according to its advanced performance. Even You et al [42] enhance the interpretability by the depth selectivity of the model's hidden units. But the process of selecting hidden units is based on observation, which is also a backward inference.…”
Section: Interpretable Analysismentioning
confidence: 99%
“…To integrate visual odometry (VO) or the SLAM system into depth estimation, the authors of [10,12,13] presented a neural network to correct classical VO estimators in a selfsupervised manner and enhance geometric constraints. Self-supervised depth estimation, using the pose and depth between two adjacent frames, establishes a depth reprojection error and image reconstruction error [14][15][16][17]. In a monocular depth self-supervised estimation, the depth value estimated by the depth estimation network (DepthNet) and the pose between adjacent images have a decisive influence on the depth estimation result.…”
Section: Introductionmentioning
confidence: 99%