Constant Velocity Constraints for Self-Supervised Monocular Depth Estimation

Zhou, Hang; Greenwood, David A.; Taylor, Sarah L.; Gong, Han

doi:10.1145/3429341.3429355

Cited by 20 publications

(20 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Supervised depth estimation has a significant body of work [19], [20], but requires pixel-wise depth labeling. Selfsupervised learning has been applied to monocular depth estimation, achieving better performance than supervised methods [21], [22]. Further research incorporates semantic information [23], [24], but monocular depth estimation remains challenging due to its ill-posed nature.…”

Section: B Self-supervised Depth Estimationmentioning

confidence: 99%

Spatial Group-Wise Enhance: Enhancing Semantic Feature Learning in CNN

Yang

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Mobile ground robots require perceiving and understanding their surrounding support surface to move around autonomously and safely. The support surface is commonly estimated based on exteroceptive depth measurements, e.g., from LiDARs. However, the measured depth fails to align with the true support surface in the presence of high grass or other penetrable vegetation. In this work, we present the Semantic Pointcloud Filter (SPF), a Convolutional Neural Network (CNN) that learns to adjust LiDAR measurements to align with the underlying support surface. The SPF is trained in a semi-selfsupervised manner and takes as an input a LiDAR pointcloud and RGB image. The network predicts a binary segmentation mask that identifies the specific points requiring adjustment, along with estimating their corresponding depth values. To train the segmentation task, 300 distinct images are manually labeled into rigid and non-rigid terrain. The depth estimation task is trained in a self-supervised manner by utilizing the future footholds of the robot to estimate the support surface based on a Gaussian process. Our method can correctly adjust the support surface prior to interacting with the terrain and is extensively tested on the quadruped robot ANYmal. We show the qualitative benefits of SPF in natural environments for elevation mapping and traversability estimation compared to using raw sensor measurements and existing smoothing methods. Quantitative analysis is performed in various natural environments, and an improvement by 48% RMSE is achieved within a meadow terrain.

show abstract

Section: B Self-supervised Depth Estimationmentioning

confidence: 99%

Spatial Group-Wise Enhance: Enhancing Semantic Feature Learning in CNN

Yang

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Furthermore, due to the ambiguous nature of photometric loss, the depth can only be predicted up to an unknown scale factor. Since then, a number of works [2,17,19,20,37,43,45,64,75,82,83,86] advanced the field considerably. For example, Godard et al [17] propose to upsample the multi-scale depth maps before loss calculation and use the minimum photometric error to tackle occlusions.…”

Section: Self-supervised Monocular Depth Estimationmentioning

confidence: 99%

“…The generated depth images are much more accurate, but very sparse. Therefore, recent methods follow a self-supervised training strategy using either stereo images [14,16], video sequences [2,19,20,45,64,75,[82][83][84]86], or both [17,37,43] during training. The training objective is formulated as an image synthesis problem based on geometric constraints.…”

Section: Introductionmentioning

confidence: 99%

MGNet: Monocular Geometric Scene Understanding for Autonomous Driving

Schön

Buchholz

Dietmayer

2021

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

We introduce MGNet, a multi-task framework for monocular geometric scene understanding. We define monocular geometric scene understanding as the combination of two known tasks: Panoptic segmentation and self-supervised monocular depth estimation. Panoptic segmentation captures the full scene not only semantically, but also on an instance basis. Self-supervised monocular depth estimation uses geometric constraints derived from the camera measurement model in order to measure depth from monocular video sequences only. To the best of our knowledge, we are the first to propose the combination of these two tasks in one single model. Our model is designed with focus on low latency to provide fast inference in real-time on a single consumer-grade GPU. During deployment, our model produces dense 3D point clouds with instance aware semantic labels from single high-resolution camera images. We evaluate our model on two popular autonomous driving benchmarks, i.e., Cityscapes and KITTI, and show competitive performance among other real-time capable methods. Source code is available at https://github. com/markusschoen/MGNet.

show abstract

“…Supervised depth estimation approaches [1,5,8,10,31,35,38] can predict dense depth maps but require costly labelled depth ground truth. In contrast, self-supervised approaches require no labelled data [12,13,23,30,34,39,41,[46][47][48][49] and are performing competitively at this task. At a high level, self-supervised depth estimation approaches use a depth networks' output as an intermediate representation for a stereo matching problem or an image reconstruction task.…”

Section: Introductionmentioning

confidence: 99%

“…At a high level, self-supervised depth estimation approaches use a depth networks' output as an intermediate representation for a stereo matching problem or an image reconstruction task. For the latter, these approaches [13,34,43,[47][48][49] are trained with a selfsupervised monocular depth estimation (SDE) framework.…”

Section: Introductionmentioning

confidence: 99%

SUB-Depth: Self-distillation and Uncertainty Boosting Self-supervised Monocular Depth Estimation

Zhou

Taylor

Greenwood

2021

Preprint

Self Cite

View full text Add to dashboard Cite

We propose SUB-Depth, a universal multi-task training framework for self-supervised monocular depth estimation (SDE). Depth models trained with SUB-Depth outperform the same models trained in a standard single-task SDE framework. By introducing an additional self-distillation task into a standard SDE training framework, SUB-Depth trains a depth network, not only to predict the depth map for an image reconstruction task, but also to distill knowledge from a trained teacher network with unlabelled data. To take advantage of this multi-task setting, we propose homoscedastic uncertainty formulations for each task to penalise areas likely to be affected by teacher network noise, or violate SDE assumptions. We present extensive evaluations on KITTI to demonstrate the improvements achieved by training a range of existing networks using the proposed framework, and we achieve state-of-the-art performance on this task. Additionally, SUB-Depth enables models to estimate uncertainty on depth output.

show abstract

Constant Velocity Constraints for Self-Supervised Monocular Depth Estimation

Cited by 20 publications

References 29 publications

Spatial Group-Wise Enhance: Enhancing Semantic Feature Learning in CNN

Spatial Group-Wise Enhance: Enhancing Semantic Feature Learning in CNN

MGNet: Monocular Geometric Scene Understanding for Autonomous Driving

SUB-Depth: Self-distillation and Uncertainty Boosting Self-supervised Monocular Depth Estimation

Contact Info

Product

Resources

About