2020
DOI: 10.1007/978-3-030-58565-5_35
|View full text |Cite
|
Sign up to set email alerts
|

Self-supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
129
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 266 publications
(129 citation statements)
references
References 60 publications
0
129
0
Order By: Relevance
“…Our method does not require MVS depth as input, only camera poses, and does not require training a fusion module. In addition, some methods [Klingner et al 2020;Patil et al 2020] use semantic information to guide depth prediction and identify moving objects during training. Those methods usually focuses on specific scenarios such as autonomous driving.…”
Section: Related Workmentioning
confidence: 99%
“…Our method does not require MVS depth as input, only camera poses, and does not require training a fusion module. In addition, some methods [Klingner et al 2020;Patil et al 2020] use semantic information to guide depth prediction and identify moving objects during training. Those methods usually focuses on specific scenarios such as autonomous driving.…”
Section: Related Workmentioning
confidence: 99%
“…For some ablation experiments, we also use the PSPNet architecture [80], using a pyramid pooling module as decoder without skip connections, and the ERFNet architecture [81], using efficient residual connections and factorized convolutions. We simply double the decoder architecture for the second decoder head to obtain a multi-task network as described in [52], while adapting the full-resolution multi-scale depth loss formulation proposed by [4]. We apply the gradient scaling described by ( 9) at all connections (particularly, also at skip connections) between the encoder and the decoder with a scaling factor of λ = 0.1.…”
Section: B Experimental Setupmentioning
confidence: 99%
“…The introduction of semantic cues into depth estimation has been explored by prior works. One direction is to use segmentation masks to mask out pixels belonging to dynamic classes so that they would not contribute to the optimization process [6,8]. The other one is coupling a semantic segmentation network into the depth estimation task to take advantage of semantic features.…”
Section: Incorporating Semantic Informationmentioning
confidence: 99%
“…The proposed method recomposes the training objective with semantic cues and geometry constraints in a self-supervised manner, making it more credible towards ambiguous pixels. Meanwhile, since the segmentation network only contributes in training phase and no intermediate output is fused into depth network, the proposed method does not introduce extra inference complexity compared to prior works [6,7,8,9]. Our main contribution can be concluded as:…”
Section: Introductionmentioning
confidence: 99%