While numerous deep approaches to the problem of vision-aided localization have been recently proposed, systems operating in the real world will undoubtedly experience novel sensory states previously unseen even under the most prodigious training regimens. We address the localization problem with online error correction (OEC) modules that are trained to correct a vision-aided localization network's mistakes. We demonstrate the generalizability of the OEC modules and describe our unsupervised deep neural network approach to the fusion of RGB-D imagery with inertial measurements for absolute trajectory estimation. Our network, dubbed the Visual-Inertial-Odometry Learner (VIOLearner), learns to perform visual-inertial odometry (VIO) without inertial measurement unit (IMU) intrinsic parameters or the extrinsic calibration between an IMU and camera. The network learns to integrate IMU measurements and generate hypothesis trajectories which are then corrected online according to the Jacobians of scaled image projection errors with respect to spatial grids of pixel coordinates. We evaluate our network against state-of-the-art (SoA) VIO, visual odometry (VO), and visual simultaneous localization and mapping (VSLAM) approaches on the KITTI Odometry dataset as well as a micro aerial vehicle (MAV) dataset that we collected in the AirSim simulation environment. We demonstrate better than SoA translational localization performance against comparable SoA approaches on our evaluation sequences.
We present an unsupervised deep neural network approach to the fusion of RGB-D imagery with inertial measurements for absolute trajectory estimation. Our network, dubbed the Visual-Inertial-Odometry Learner (VIOLearner), learns to perform visual-inertial odometry (VIO) without inertial measurement unit (IMU) intrinsic parameters (corresponding to gyroscope and accelerometer bias or white noise) or the extrinsic calibration between an IMU and camera. The network learns to integrate IMU measurements and generate hypothesis trajectories which are then corrected online according to the Jacobians of scaled image projection errors with respect to a spatial grid of pixel coordinates. We evaluate our network against state-of-the-art (SOA) visual-inertial odometry, visual odometry, and visual simultaneous localization and mapping (VSLAM) approaches on the KITTI Odometry dataset [1] and demonstrate competitive odometry performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.