2019 International Conference on Robotics and Automation (ICRA) 2019
DOI: 10.1109/icra.2019.8793637
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera

Abstract: Depth completion, the technique of estimating a dense depth image from sparse depth measurements, has a variety of applications in robotics and autonomous driving. However, depth completion faces 3 main challenges: the irregularly spaced pattern in the sparse depth input, the difficulty in handling multiple sensor modalities (when color images are available), as well as the lack of dense, pixel-level ground truth depth labels. In this work, we address all these challenges. Specifically, we develop a deep regre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
598
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 425 publications
(602 citation statements)
references
References 42 publications
4
598
0
Order By: Relevance
“…However, the assistance from other modalities, e.g., color images, can significantly improve the completion accuracy. Ma et al concatenated the sparse depth and color image as the inputs of an off-the-shelf network [26] and further explored the feasibility of self-supervised Li-DAR completion [23]. Moreover, [14,16,33,4] proposed different network architectures to better exploit the potential of the encoder-decoder framework.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the assistance from other modalities, e.g., color images, can significantly improve the completion accuracy. Ma et al concatenated the sparse depth and color image as the inputs of an off-the-shelf network [26] and further explored the feasibility of self-supervised Li-DAR completion [23]. Moreover, [14,16,33,4] proposed different network architectures to better exploit the potential of the encoder-decoder framework.…”
Section: Related Workmentioning
confidence: 99%
“…With the advances of deep learning methods, many depth completion approaches based on convolutional neural networks (CNNs) have been proposed. The mainstream of these methods is to directly input the sparse depth maps (with/without color images) into an encoder-decoder network and predict dense depth maps [26,16,36,15,10,23,2]. These black-box methods force the CNN to learn a mapping from sparse depth measurements to dense maps, which is generally a challenging task and leads to unsatisfactory completion results, as shown in Fig.…”
Section: Introductionmentioning
confidence: 99%
“…The predicted distance to a stop sign had a standard deviation of 1.7 m and the predicted distance to a traffic light had a standard deviation of 5.9 m. Given the scope of this study, these error values were considered acceptable. However, to more accurately predict these distance values, a more sophisticated technique could be used, such as incorporating a stereo camera or using a monocular depth estimation approach . For this study, the predicted distance to a stop sign was considered an approximate distance to the start of the intersection, and the predicted distance to a traffic light was considered an approximate distance to the end of the intersection.…”
Section: System Architecturementioning
confidence: 99%
“…However, to more accurately predict these distance values, a more sophisticated technique could be used, such as incorporating a stereo camera or using a monocular depth estimation approach. 34 For this study, the predicted distance to a stop sign was considered an approximate distance to the start of the intersection, and the predicted distance to a traffic light was considered an approximate distance to the end of the intersection. For measurements that returned multiple detections, the mean predicted distance was used.…”
Section: Intersection Estimatormentioning
confidence: 99%
“…Interpolation techniques have been widely used in lots of computer vision and robotics tasks, which can be classified into two categories, i.e., temporal interpolation [1], [8], [14] and spatial interpolation [10], [12], [26]. In video processing, video interpolation aims to temporally generate an intermediate frame using two consecutive frames.…”
Section: Introductionmentioning
confidence: 99%