2020
DOI: 10.1109/lra.2020.2969938
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Depth Completion From Visual Inertial Odometry

Abstract: We describe a method to infer dense depth from camera motion and sparse depth as estimated using a visualinertial odometry system. Unlike other scenarios using point clouds from lidar or structured light sensors, we have few hundreds to few thousand points, insufficient to inform the topology of the scene. Our method first constructs a piecewise planar scaffolding of the scene, and then uses it to infer dense depth using the image along with the sparse points. We use a predictive cross-modal criterion, akin to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
191
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 98 publications
(194 citation statements)
references
References 34 publications
2
191
1
Order By: Relevance
“…Our goal is to recover a 3D scene from a real RGB image I t : Ω ⊂ R 2 → R 3 + and the associated set of sparse depth measurements z : Ω z ⊂ Ω → R + , without access to ground-truth depth annotations. We follow the unsupervised monocular training paradigm [21], [33] and assume there exists temporally adjacent frames, I τ for τ ∈ T . = {t − 1, t + 1} denoting the previous and the next time stamp relative to I t , available during training.…”
Section: Methods Formulationmentioning
confidence: 99%
See 4 more Smart Citations
“…Our goal is to recover a 3D scene from a real RGB image I t : Ω ⊂ R 2 → R 3 + and the associated set of sparse depth measurements z : Ω z ⊂ Ω → R + , without access to ground-truth depth annotations. We follow the unsupervised monocular training paradigm [21], [33] and assume there exists temporally adjacent frames, I τ for τ ∈ T . = {t − 1, t + 1} denoting the previous and the next time stamp relative to I t , available during training.…”
Section: Methods Formulationmentioning
confidence: 99%
“…Unsupervised depth completion assumes additional (stereo, temporally consecutive frames) data available during training. Both stereo [26], [36] and monocular [21], [31], [32], [33] paradigms learn dense depth from an image and sparse depth measurements by minimizing the photometric error between the input image and its reconstruction from other views along with the difference between prediction and sparse depth input (sparse depth reconstruction). [21] used Perspective-n-Point [19] and RANSAC [9] to align consecutive frames and [34], [32] proposed an adaptive weighting framework.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations