What is a good visual representation for navigation? We study this question in the context of semantic visual navigation, which is the problem of a robot finding its way through a previously unseen environment to a target object, e.g. go to the refrigerator. Instead of acquiring a metric semantic map of an environment and using planning for navigation, our approach learns navigation policies on top of representations that capture spatial layout and semantic contextual cues.We propose to use semantic segmentation and detection masks as observations obtained by state-of-the-art computer vision algorithms and use a deep network to learn the navigation policy. The availability of equitable representations in simulated environments enables joint training using real and simulated data and alleviates the need for domain adaptation or domain randomization commonly used to tackle the sim-toreal transfer of the learned policies. Both the representation and the navigation policy can be readily applied to real nonsynthetic environments as demonstrated on the Active Vision Dataset [1]. Our approach successfully gets to the target in 54% of the cases in unexplored environments, compared to 46% for a non-learning based approach, and 28% for a learning-based baseline.
Learned Neural Network based policies have shown promising results for robot navigation. However, most of these approaches fall short of being used on a real robot due to the extensive simulated training they require. These simulations lack the visuals and dynamics of the real world, which makes it infeasible to deploy on a real robot. We present a novel Neural Net based policy, NavNet, which allows for easy deployment on a real robot. It consists of two sub policies -a high level policy which can understand real images and perform long range planning expressed in high level commands; a low level policy that can translate the long range plan into low level commands on a specific platform in a safe and robust manner. For every new deployment, the high level policy is trained on an easily obtainable scan of the environment modeling its visuals and layout. We detail the design of such an environment and how one can use it for training a final navigation policy. Further, we demonstrate a learned low-level policy. We deploy the model in a large office building and test it extensively, achieving 0.80 success rate over long navigation runs and outperforming SLAM-based models in the same settings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.