Learning Visual Odometry with a Convolutional Network

Konda, Kishore Reddy; Memisevic, Roland

doi:10.5220/0005299304860490

Cited by 140 publications

(90 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Numerous approaches [1,14,21,24,25] learn the task of VO estimation using ground-truth data available in the form of global-camera poses, recorded by high-precision GPU+IMS rigs.…”

Section: Supervised Approachesmentioning

confidence: 99%

“…Konda et.al. [14] first proposed an autoencoder to learn a latent representation of the optical flow between camera frames jointly with the ego-motion estimation task. Kendall et.al.…”

Section: Supervised Approachesmentioning

confidence: 99%

“…In contrast, deep learning methods learn feature representations instead of handcrafting them. Consequently, they have been applied to the problem of visual place recognition for SLAM (discovering that we are in a previously visited place) [16] as well as VO [1,14,15,21,24,25,26,26].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Geometric Consistency for Self-Supervised End-to-End Visual Odometry

Iyer

Murthy

Gupta

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

With the success of deep learning based approaches in tackling challenging problems in computer vision, a wide range of deep architectures have recently been proposed for the task of visual odometry (VO) estimation. Most of these proposed solutions rely on supervision, which requires the acquisition of precise ground-truth camera pose information, collected using expensive motion capture systems or high-precision IMU/GPS sensor rigs. In this work, we propose an unsupervised paradigm for deep visual odometry learning. We show that using a noisy teacher, which could be a standard VO pipeline, and by designing a loss term that enforces geometric consistency of the trajectory, we can train accurate deep models for VO that do not require ground-truth labels. We leverage geometry as a selfsupervisory signal and propose "Composite Transformation Constraints (CTCs)", that automatically generate supervisory signals for training and enforce geometric consistency in the VO estimate. We also present a method of characterizing the uncertainty in VO estimates thus obtained. To evaluate our VO pipeline, we present exhaustive ablation studies that demonstrate the efficacy of end-to-end, self-supervised methodologies to train deep models for monocular VO. We show that leveraging concepts from geometry and incorporating them into the training of a recurrent neural network results in performance competitive to supervised deep VO methods.

show abstract

“…Numerous approaches [1,14,21,24,25] learn the task of VO estimation using ground-truth data available in the form of global-camera poses, recorded by high-precision GPU+IMS rigs.…”

Section: Supervised Approachesmentioning

confidence: 99%

“…Konda et.al. [14] first proposed an autoencoder to learn a latent representation of the optical flow between camera frames jointly with the ego-motion estimation task. Kendall et.al.…”

Section: Supervised Approachesmentioning

confidence: 99%

See 1 more Smart Citation

Geometric Consistency for Self-Supervised End-to-End Visual Odometry

Iyer

Murthy

Gupta

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

show abstract

“…This allows the freedom to model various unconstrained and partially constrained motions that typically affect the overall robustness of existing ego-motion algorithms. While model-based approaches have shown tremendous progress in accuracy, robustness, and run-time performance, a few recent data-driven approaches have been shown to produce equally compelling results [20], [22], [24]. An adaptive and trainable solution for relative pose estimation or ego-motion can be especially advantageous for several reasons: (i) a generalpurpose end-to-end trainable model architecture that applies to a variety of camera optics including pinhole, fisheye, and catadioptric lenses; (ii) simultaneous and continuous optimization over both ego-motion estimation and camera parameters (intrinsics and extrinsics that are implicitly modeled); and (iii) joint reasoning over resource-aware computation and accuracy within the same architecture is amenable.…”

Section: Ego-motion Regressionmentioning

confidence: 99%

Towards visual ego-motion learning in robots

Pillai

Leonard

2017

2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

Abstract-Many model-based Visual Odometry (VO) algorithms have been proposed in the past decade, often restricted to the type of camera optics, or the underlying motion manifold observed. We envision robots to be able to learn and perform these tasks, in a minimally supervised setting, as they gain more experience. To this end, we propose a fully trainable solution to visual ego-motion estimation for varied camera optics. We propose a visual ego-motion learning architecture that maps observed optical flow vectors to an ego-motion density estimate via a Mixture Density Network (MDN). By modeling the architecture as a Conditional Variational Autoencoder (C-VAE), our model is able to provide introspective reasoning and prediction for ego-motion induced scene-flow. Additionally, our proposed model is especially amenable to bootstrapped egomotion learning in robots where the supervision in ego-motion estimation for a particular camera sensor can be obtained from standard navigation-based sensor fusion strategies (GPS/INS and wheel-odometry fusion). Through experiments, we show the utility of our proposed approach in enabling the concept of self-supervised learning for visual ego-motion estimation in autonomous robots.

show abstract

“…CNNs are also capable of ego-motion estimation (Konda and Memisevic, 2015). However, the results need to be improved to compete with conventional methods.…”

Section: Introductionmentioning

confidence: 99%

Squeezeposenet: Image Based Pose Regression With Small Convolutional Neural Networks for Real Time Uas Navigation

Müller

Urban

Jutzi

2017

ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.

View full text Add to dashboard Cite

ABSTRACT:The number of unmanned aerial vehicles (UAVs) is increasing since low-cost airborne systems are available for a wide range of users. The outdoor navigation of such vehicles is mostly based on global navigation satellite system (GNSS) methods to gain the vehicles trajectory. The drawback of satellite-based navigation are failures caused by occlusions and multi-path interferences. Beside this, local image-based solutions like Simultaneous Localization and Mapping (SLAM) and Visual Odometry (VO) can e.g. be used to support the GNSS solution by closing trajectory gaps but are computationally expensive. However, if the trajectory estimation is interrupted or not available a re-localization is mandatory. In this paper we will provide a novel method for a GNSS-free and fast image-based pose regression in a known area by utilizing a small convolutional neural network (CNN). With on-board processing in mind, we employ a lightweight CNN called SqueezeNet and use transfer learning to adapt the network to pose regression. Our experiments show promising results for GNSS-free and fast localization.

show abstract

Learning Visual Odometry with a Convolutional Network

Cited by 140 publications

References 7 publications

Geometric Consistency for Self-Supervised End-to-End Visual Odometry

Geometric Consistency for Self-Supervised End-to-End Visual Odometry

Towards visual ego-motion learning in robots

Squeezeposenet: Image Based Pose Regression With Small Convolutional Neural Networks for Real Time Uas Navigation

Contact Info

Product

Resources

About