Abstract-This paper proposes an algorithm that enables robots to efficiently learn human-centric models of their environment from natural language descriptions. Typical semantic mapping approaches augment metric maps with higher-level properties of the robot's surroundings (e.g., place type, object locations), but do not use this information to improve the metric map. The novelty of our algorithm lies in fusing high-level knowledge, conveyed by speech, with metric information from the robot's low-level sensor streams. Our method jointly estimates a hybrid metric, topological, and semantic representation of the environment. This semantic graph provides a common framework in which we integrate concepts from natural language descriptions (e.g., labels and spatial relations) with metric observations from low-level sensors. Our algorithm efficiently maintains a factored distribution over semantic graphs based upon the stream of natural language and low-level sensor information. We evaluate the algorithm's performance and demonstrate that the incorporation of information from natural language increases the metric, topological and semantic accuracy of the recovered environment model.
Abstract-Natural language offers an intuitive and flexible means for humans to communicate with the robots that we will increasingly work alongside in our homes and workplaces. Recent advancements have given rise to robots that are able to interpret natural language manipulation and navigation commands, but these methods require a prior map of the robot's environment. In this paper, we propose a novel learning framework that enables robots to successfully follow natural language route directions without any previous knowledge of the environment. The algorithm utilizes spatial and semantic information that the human conveys through the command to learn a distribution over the metric and semantic properties of spatially extended environments. Our method uses this distribution in place of the latent world model and interprets the natural language instruction as a distribution over the intended behavior. A novel belief space planner reasons directly over the map and behavior distributions to solve for a policy using imitation learning. We evaluate our framework on a voice-commandable wheelchair. The results demonstrate that by learning and performing inference over a latent environment model, the algorithm is able to successfully follow natural language route directions within novel, extended environments.
Abstract-We describe a semantic mapping algorithm that learns human-centric environment models from by interpreting natural language utterances. Underlying the approach is a coupled metric, topological, and semantic representation of the environment that enables the method to infer and fuse information from natural language descriptions with low-level metric and appearance data. We extend earlier work with a novel formulation incorporates spatial layout into a topological representation of the environment. We also describe a factor graph formulation of the semantic properties that encodes human-centric concepts such as type and colloquial name for each mapped region. The algorithm infers these properties by combining the user's natural language descriptions with image-and laser-based scene classification. We also propose a mechanism to more effectively ground natural language descriptions of spatially non-local regions using semantic cues from other modalities. We describe how the algorithm employs this learned semantic information to propose valid topological hypotheses, leading to more accurate topological and metric maps. We demonstrate that integrating language with other sensor data increases the accuracy of the achieved spatialsemantic representation of the environment.
Abstract. Natural language provides a flexible, intuitive way for people to command robots, which is becoming increasingly important as robots transition to working alongside people in our homes and workplaces. To follow instructions in unknown environments, robots will be expected to reason about parts of the environments that were described in the instruction, but that the robot has no direct knowledge about. This paper proposes a probabilistic framework that enables robots to follow commands given in natural language, without any prior knowledge of the environment. The novelty lies in exploiting environment information implicit in the instruction, thereby treating language as a type of sensor which is used to formulate a prior distribution over the unknown parts of the environment. The algorithm then uses this learned distribution to infer a sequence of actions that are most consistent with the command, updating our belief as we gather more metric information. We evaluate our approach through simulation as well as experiments on two mobile robots; our results demonstrate the algorithm's ability to follow navigation commands with performance comparable to that of a fully-known environment.
We describe a robotic tour-taking capability enabling a robot to acquire local knowledge of a human-occupied environment. A tour-taking robot autonomously follows a human guide through an environment, interpreting the guide's spoken utterances and the shared spatiotemporal context in order to acquire a spatially segmented and semantically labeled metrical-topological representation of the environment. The described tour-taking capability enables scalable deployment of mobile robots into human-occupied environments, and natural human-robot interaction for commanded mobility.Our primary contributions are an efficient, socially acceptable autonomous tour-following behavior and a tour interpretation algorithm that partitions a map into spaces labeled according to the guide's utterances. The tour-taking behavior is demonstrated in a multi-floor office building and evaluated by assessing the comfort of the tour guides, and by comparing the robot's map partitions to those produced by humans.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.