Depth Prediction without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

Casser, Vincent; Pirk, Soeren; Mahjourian, Reza; Angelova, Anelia

doi:10.1609/aaai.v33i01.33018001

Cited by 452 publications

(411 citation statements)

References 19 publications

Supporting

Mentioning

396

Contrasting

Order By: Relevance

“…The CNNs is trained by a photometric reconstruction loss, which was obtained by warping nearby views to the target using the computed depth and pose. In addition, [21] addresses unsupervised learning of scene depth, robot ego-motion and object motions where the supervision is provided by geometric structure of monocular videos as input. Furthermore, many recent efforts [11], [22] explore the geometric relationships between depth, camera motion, and flow for unsupervised learning of depth and flow estimation models.…”

Section: Unsupervised Learning Methodsmentioning

confidence: 99%

DeepVIO: Self-supervised Deep Learning of Monocular Visual Inertial Odometry using 3D Geometric Constraints

Han¹,

Lin²,

Du³

et al. 2019

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

This paper presents an self-supervised deep learning network for monocular visual inertial odometry (named DeepVIO). DeepVIO provides absolute trajectory estimation by directly merging 2D optical flow feature (OFF) and Inertial Measurement Unit (IMU) data. Specifically, it firstly estimates the depth and dense 3D point cloud of each scene by using stereo sequences, and then obtains 3D geometric constraints including 3D optical flow and 6-DoF pose as supervisory signals. Note that such 3D optical flow shows robustness and accuracy to dynamic objects and textureless environments. In DeepVIO training, 2D optical flow network is constrained by the projection of its corresponding 3D optical flow, and LSTMstyle IMU preintegration network and the fusion network are learned by minimizing the loss functions from ego-motion constraints. Furthermore, we employ an IMU status update scheme to improve IMU pose estimation through updating the additional gyroscope and accelerometer bias. The experimental results on KITTI and EuRoC datasets show that DeepVIO outperforms state-of-the-art learning based methods in terms of accuracy and data adaptability. Compared to the traditional methods, DeepVIO reduces the impacts of inaccurate Camera-IMU calibrations, unsynchronized and missing data.

show abstract

Section: Unsupervised Learning Methodsmentioning

confidence: 99%

DeepVIO: Self-supervised Deep Learning of Monocular Visual Inertial Odometry using 3D Geometric Constraints

Han¹,

Lin²,

Du³

et al. 2019

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

show abstract

“…Vijayanarasimhan et al [41] additionally learn rigid motion parameters of multiple objects. Subsequent methods further proposed improvements based on various techniques, such as an ICP alignment loss [30], supervision from SfM algorithms [26], optical flow [48,52], edges [47], modeling multiple rigid motions informed by instance segmentation [3], motion segmentation [36], minimum projection loss [16], etc.…”

Section: Related Workmentioning

confidence: 99%

“…Frames that are similar to the test scenes are removed from the training set. We compare the performance of the proposed framework with the baseline, as well as recent stateof-the-art works in the same setting [3,30,42,47,48,51,52].…”

Section: Depth Estimationmentioning

confidence: 99%

“…We notice that the online refinement is not used in other methods except [3]. Thus to facilitate the comparison, we also report the results of GLNet without online refinement, denoted as GLNet (-ref.).…”

Section: Depth Estimationmentioning

confidence: 99%

“…We also conduct an ablation study over refinement strategies. We use the normal parameter finetuning without regularizer as baseline, as in [3]. We compare the baseline with the two proposed refinement strategies, i.e.…”

Section: Online Refinementmentioning

confidence: 99%

See 2 more Smart Citations

Self-Supervised Learning With Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera

Chen

Schmid

Sminchisescu

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

260

179

View full text Add to dashboard Cite

We present GLNet, a self-supervised framework for learning depth, optical flow, camera pose and intrinsic parameters from monocular video -addressing the difficulty of acquiring realistic ground-truth for such tasks. We propose three contributions: 1) we design new loss functions that capture multiple geometric constraints (e.g. epipolar geometry) as well as an adaptive photometric loss that supports multiple moving objects, rigid and non-rigid, 2) we extend the model such that it predicts camera intrinsics, making it applicable to uncalibrated video, and 3) we propose several online refinement strategies that rely on the symmetry of our self-supervised loss in training and testing, in particular optimizing model parameters and/or the output of different tasks, thus leveraging their mutual interactions. The idea of jointly optimizing the system output, under all geometric and photometric constraints can be viewed as a dense generalization of classical bundle adjustment. We demonstrate the effectiveness of our method on KITTI and Cityscapes, where we outperform previous self-supervised approaches on multiple tasks. We also show good generalization for transfer learning in YouTube videos.

show abstract

Feature-Metric Loss for Self-supervised Learning of Depth and Egomotion

Shu¹,

Yu²,

Duan³

et al. 2020

Lecture Notes in Computer Science

184

135

View full text Add to dashboard Cite

Depth Prediction without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

Cited by 452 publications

References 19 publications

DeepVIO: Self-supervised Deep Learning of Monocular Visual Inertial Odometry using 3D Geometric Constraints

DeepVIO: Self-supervised Deep Learning of Monocular Visual Inertial Odometry using 3D Geometric Constraints

Self-Supervised Learning With Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera

Feature-Metric Loss for Self-supervised Learning of Depth and Egomotion

Contact Info

Product

Resources

About