2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00471
|View full text |Cite
|
Sign up to set email alerts
|

Distilled Semantics for Comprehensive Scene Understanding from Videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
39
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 75 publications
(40 citation statements)
references
References 69 publications
1
39
0
Order By: Relevance
“…On both monocular depth estimation and monocular visual odometry by the unsupervised learning of depth and pose, our method outperforms recent state-of-the-art methods. Besides, our method does not need auxiliary tasks, such as optical flow estimation [9]- [12], [26], [39], semantic segmentation [39] or dynamic mask estimation [11], [26], normal estimation [8]. This paper realizes end-to-end iterative view synthesis and pose refinement to jointly optimize the pose and depth estimation networks, allowing the overall parameters to selflearn according to the optimization objective.…”
Section: B Evaluation Resultsmentioning
confidence: 99%
“…On both monocular depth estimation and monocular visual odometry by the unsupervised learning of depth and pose, our method outperforms recent state-of-the-art methods. Besides, our method does not need auxiliary tasks, such as optical flow estimation [9]- [12], [26], [39], semantic segmentation [39] or dynamic mask estimation [11], [26], normal estimation [8]. This paper realizes end-to-end iterative view synthesis and pose refinement to jointly optimize the pose and depth estimation networks, allowing the overall parameters to selflearn according to the optimization objective.…”
Section: B Evaluation Resultsmentioning
confidence: 99%
“…Recently, deep learningbased methods have dominated this field because of their powerful feature representation capacity. There are roughly two types of deep learning-based MDE, supervised methods [10], [11], [12], [13] and self-supervised (unsupervised) methods [14], [15], [16], [17], [18], [18], [19], [20], [21], [22], [23], [24], [25].…”
Section: Related Workmentioning
confidence: 99%
“…As camera parameters are learnt in a fully self-supervised manner, even image sequences from wild can be used, where the network generalizes better. Later, other works, applicable to image sequences from wild, such as in [8] and [51] were proposed which followed an approach similar to [22] in camera parameters estimation. Our approach is similar to these, but does not need object motion mask as input [22,51] or online refinement [8], to output accurate depth maps, while assuming a pinhole camera model with minimum or no distortion.…”
Section: Related Workmentioning
confidence: 99%
“…This eliminates the plausible usage of potential data from the wild for training. Previous works [8,22,51] have addressed such issues in the past, however, are not on par with the other approaches which use ground-truth calibrated data. In our work, we primarily address this issue by removing the necessity of pre-calibrated data and also focus on refining monocular depth estimation accuracy.…”
Section: Introductionmentioning
confidence: 99%