2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00181
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Temporal Consistency for Real-Time Video Depth Estimation

Abstract: Accuracy of depth estimation from static images has been significantly improved recently, by exploiting hierarchical features from deep convolutional neural networks (CNNs). Compared with static images, vast information exists among video frames and can be exploited to improve the depth estimation performance. In this work, we focus on exploring temporal information from monocular videos for depth estimation. Specifically, we take the advantage of convolutional long short-term memory (CLSTM) and propose a nove… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
97
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 114 publications
(97 citation statements)
references
References 38 publications
0
97
0
Order By: Relevance
“…Our method is designed for online depth estimation in videos. Similar idea of online estimation is proposed in recent works [30], [31], where they use ConvLSTM but in supervised framework. There are also earlier methods for offline depth estimation from videos [32], in which local motion cues and optical flow are used to produce temporally consistent depth maps.…”
Section: Related Workmentioning
confidence: 91%
See 1 more Smart Citation
“…Our method is designed for online depth estimation in videos. Similar idea of online estimation is proposed in recent works [30], [31], where they use ConvLSTM but in supervised framework. There are also earlier methods for offline depth estimation from videos [32], in which local motion cues and optical flow are used to produce temporally consistent depth maps.…”
Section: Related Workmentioning
confidence: 91%
“…7. The quantitative metrices defined by [30] are not suitable for Datasets with sparse ground truth. We define our evaluation metric, Absolute Relative Temporal Error(ARTE), as follows: .…”
Section: B Ablation Studymentioning
confidence: 99%
“…Meanwhile, they not only compute the short-term temporal loss, but also calculate the long-time temporal loss by the frame pairs which are sampled from the output sequence. Zhang et al [48] propose a ST-CLSTM structure which extracts both spatial features and temporal correlations to retain the temporal consistency. In this paper, we teach our network to learn the temporal consistency from the original videos.…”
Section: Related Workmentioning
confidence: 99%
“…The resultant combined global and local feature clues are then used to form the cost volume for the following reliable disparity estimation. Recently, more and more works [49]- [53] concentrate on monocular depth estimation field. Godard et al [49] used self-supervised learning to training models in order to resolve the problem that per-pixel ground-truth depth data is challenging to acquire at scale.…”
Section: Related Workmentioning
confidence: 99%