2022
DOI: 10.48550/arxiv.2208.03543
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer

Abstract: Figure 1. Effects of global reasoning on self-supervised monocular depth estimation. The limited receptive field of existing solutions (e.g. HR-Depth [30], in the middle) often yields inaccurate depth estimation, losing fine-grained details (like the car and cyclist over imposed in yellow). On the contrary, our MonoViT architecture (right) achieves superior results.

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(8 citation statements)
references
References 59 publications
1
7
0
Order By: Relevance
“…Our accuracy is superior to that of the newly proposed approaches, such as MonoFormer [64], CADepth [21], and DIFFNet [23] in all metrics. We also contrast the most advanced approach currently available, MonoViT [22]. Our results on Abs Rel and δ 1 are comparable, but we perform better on other measures, particularly Sq Rel.…”
Section: Quantitative Evaluationmentioning
confidence: 69%
See 4 more Smart Citations
“…Our accuracy is superior to that of the newly proposed approaches, such as MonoFormer [64], CADepth [21], and DIFFNet [23] in all metrics. We also contrast the most advanced approach currently available, MonoViT [22]. Our results on Abs Rel and δ 1 are comparable, but we perform better on other measures, particularly Sq Rel.…”
Section: Quantitative Evaluationmentioning
confidence: 69%
“…Furthermore, Godard et al [6] proposed a classical method Monodepth2, and they adopted an automasking scheme to filter out invalid pixels from moving objects and introduced a minimum reprojection loss to address occlusions. Based on Monodepth2, numerous current selfsupervised monocular depth estimation approaches [21][22][23] are further researched. Liu et al [24] proposed a domain-separated network for self-supervised depth estimation of allday images.…”
Section: B Self-supervised Monocular Depth Estimationmentioning
confidence: 99%
See 3 more Smart Citations