Most previous learning-based visual odometry (VO) methods take VO as a pure tracking problem. In contrast, we present a VO framework by incorporating two additional components called Memory and Refining. The Memory component preserves global information by employing an adaptive and efficient selection strategy. The Refining component ameliorates previous results with the contexts stored in the Memory by adopting a spatial-temporal attention mechanism for feature distilling. Experiments on the KITTI and TUM-RGBD benchmark datasets demonstrate that our method outperforms state-of-the-art learning-based methods by a large margin and produces competitive results against classic monocular VO approaches. Especially, our model achieves outstanding performance in challenging scenarios such as texture-less regions and abrupt motions, where classic VO algorithms tend to fail.
We present a novel end-to-end visual odometry architecture with guided feature selection based on deep convolutional recurrent neural networks. Different from current monocular visual odometry methods, our approach is established on the intuition that features contribute discriminately to different motion patterns. Specifically, we propose a dual-branch recurrent network to learn the rotation and translation separately by leveraging current Convolutional Neural Network (CNN) for feature representation and Recurrent Neural Network (RNN) for image sequence reasoning. To enhance the ability of feature selection, we further introduce an effective context-aware guidance mechanism to force each branch to distill related information for specific motion pattern explicitly. Experiments demonstrate that on the prevalent KITTI and ICL NUIM benchmarks, our method outperforms current state-of-theart model-and learning-based methods for both decoupled and joint camera pose recovery.
Abstract-We extend the standard mean shift tracking algorithm to an adaptive tracker by selecting reliable features from color and shape cues. The standard mean shift algorithm assumes that the representation of tracking targets is always sufficiently discriminative enough against background. Most tracking algorithms developed based on the mean shift algorithm use only one cue (such as color) throughout their tracking process. The widely used color features are not always discriminative enough for target localization because illumination and viewpoint tend to change. Moreover, the background may be of a color similar to that of the target. We present an adaptive tracking algorithm that integrates color and shape features. Good features are selected and applied to represent the target according to the descriptive ability of these features. The proposed method has been implemented and tested in different kinds of image sequences. The experimental results demonstrate that our tracking algorithm is robust and efficient in the challenging image sequences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.