2018 IEEE International Conference on Robotics and Automation (ICRA) 2018
DOI: 10.1109/icra.2018.8461116
|View full text |Cite
|
Sign up to set email alerts
|

Fusion of Stereo and Still Monocular Depth Estimates in a Self-Supervised Learning Context

Abstract: We study how autonomous robots can learn by themselves to improve their depth estimation capability. In particular, we investigate a self-supervised learning setup in which stereo vision depth estimates serve as targets for a convolutional neural network (CNN) that transforms a single still image to a dense depth map. After training, the stereo and mono estimates are fused with a novel fusion method that preserves high confidence stereo estimates, while leveraging the CNN estimates in the low-confidence region… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(12 citation statements)
references
References 22 publications
0
12
0
Order By: Relevance
“…Table 7 reports experiment from sequence 2011 09 26 0011 of the KITTI raw dataset [6]. We compare our framework with fusion strategies proposed by Martins et al [18] and Marin et al [17], combining outputs by the stereo networks respectively with monocular estimates (using the network by Guo et al [9]) and Lidar, reporting the ideal result as in [17]. Ground-truth labels for evaluation are provided by [39].…”
Section: Experiments With Lidar Measurementsmentioning
confidence: 99%
See 1 more Smart Citation
“…Table 7 reports experiment from sequence 2011 09 26 0011 of the KITTI raw dataset [6]. We compare our framework with fusion strategies proposed by Martins et al [18] and Marin et al [17], combining outputs by the stereo networks respectively with monocular estimates (using the network by Guo et al [9]) and Lidar, reporting the ideal result as in [17]. Ground-truth labels for evaluation are provided by [39].…”
Section: Experiments With Lidar Measurementsmentioning
confidence: 99%
“…<2% avg. All NoG All NoG iResNet [14] 18.42 18.37 1.28 1.28 iResNet+Martins et al [18] 18 is not limited to pixels with associated Lidar measurement in contrast to fusion techniques [17].…”
Section: Experiments With Lidar Measurementsmentioning
confidence: 99%
“…The image is synthesized from the network outputs, following the traditional Structure-frommotion procedure. Extra constraint and additional information have been introduced to improve the performance, like the temporal depth consistency [22], the stereo matching [23] and the semantic information [34]. Godard et al [11] achieved a significant improvement by compensating for image occlusion.…”
Section: Supervised Depth Estimationmentioning
confidence: 99%
“…Similarly, Fácil et al [58] fused CNN-based single-view and multi-view depth to improve the depth of low-parallax image sequences. Martins et al [59] have demonstrated that the stereo depth leads to higher performance with the monocular estimated depth fusion.…”
Section: B Depth Estimationmentioning
confidence: 99%