Abstract. This paper proposes a robust visual odometry algorithm using a Kinect-style RGB-D sensor and inertial measurement unit (IMU) in a highly dynamic environment. Based on SURF (Speed Up Robust Features) descriptor, the proposed algorithm generates 3-D feature points incorporating depth information into RGB color information. By using an IMU, the generated 3-D feature points are rotated in order to have the same rigid body rotation component between two consecutive images. Before calculating the rigid body transformation matrix between the successive images from the RGB-D sensor, the generated 3-D feature points are filtered into dynamic or static feature points using motion vectors. Using the static feature points, the rigid body transformation matrix is finally computed by RANSAC (RANdom SAmple Consensus) algorithm. The experiments demonstrate that visual odometry is successfully obtained for a subject and a mobile robot by the proposed algorithm in a highly dynamic environment. The comparative study between proposed method and conventional visual odometry algorithm clearly show the reliability of the approach for computing visual odometry in a highly dynamic environment.
Robots are expected to perform smart services and to undertake various troublesome or difficult tasks in the place of humans. Since these human-scale tasks consist of a temporal sequence of events, robots need episodic memory to store and retrieve the sequences to perform the tasks autonomously in similar situations. As episodic memory, in this paper we propose a novel Deep adaptive resonance theory (ART) neural model and apply it to the task performance of the humanoid robot, Mybot, developed in the Robot Intelligence Technology Laboratory at KAIST. Deep ART has a deep structure to learn events, episodes, and even more like daily episodes. Moreover, it can retrieve the correct episode from partial input cues robustly. To demonstrate the effectiveness and applicability of the proposed Deep ART, experiments are conducted with the humanoid robot, Mybot, for performing the three tasks of arranging toys, making cereal, and disposing of garbage.
Estimating the precise location of a camera using visual localization enables interesting applications such as augmented reality or robot navigation. This is particularly useful in indoor environments where other localization technologies, such as GNSS, fail. Indoor spaces impose interesting challenges on visual localization algorithms: occlusions due to people, textureless surfaces, large viewpoint changes, low light, repetitive textures, etc. Existing indoor datasets are either comparably small or do only cover a subset of the mentioned challenges. In this paper, we introduce 5 new indoor datasets for visual localization in challenging real-world environments. They were captured in a large shopping mall and a large metro station in Seoul, South Korea, using a dedicated mapping platform consisting of 10 cameras and 2 laser scanners. In order to obtain accurate ground truth camera poses, we developed a robust LiDAR SLAM which provides initial poses that are then refined using a novel structure-from-motion based optimization. We present a benchmark of modern visual localization algorithms on these challenging datasets showing superior performance of structure-based methods using robust image features. The datasets are available at: https://naverlabs.com/datasets
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.