We present PRM-RL, a hierarchical method for long-range navigation task completion that combines samplingbased path planning with reinforcement learning (RL). The RL agents learn short-range, point-to-point navigation policies that capture robot dynamics and task constraints without knowledge of the large-scale topology. Next, the sampling-based planners provide roadmaps which connect robot configurations that can be successfully navigated by the RL agent. The same RL agents are used to control the robot under the direction of the planning, enabling long-range navigation. We use the Probabilistic Roadmaps (PRMs) for the sampling-based planner. The RL agents are constructed using feature-based and deep neural net policies in continuous state and action spaces. We evaluate PRM-RL, both in simulation and on-robot, on two navigation tasks with non-trivial robot dynamics: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments with load displacement constraints. Our results show improvement in task completion over both RL agents on their own and traditional sampling-based planners. In the indoor navigation task, PRM-RL successfully completes up to 215 m long trajectories under noisy sensor conditions, and the aerial cargo delivery completes flights over 1000 m without violating the task constraints in an environment 63 million times larger than used in training.
Cargo-bearing unmanned aerial vehicles (UAVs) have tremendous potential to assist humans by delivering food, medicine, and other supplies. For time-critical cargo delivery tasks, UAVs need to be able to quickly navigate their environments and deliver suspended payloads with bounded load displacement. As a constraint balancing task for joint UAV-suspended load system dynamics, this task poses a challenge. This article presents a reinforcement learning approach for aerial cargo delivery tasks in environments with static obstacles. We first learn a minimal residual oscillations task policy in obstacle-free environments using a specifically designed feature vector for value function approximation that allows generalization beyond the training domain. The method works in continuous state and discrete action spaces. Since planning for aerial cargo requires very large action space (over 10 6 actions) that is impractical for learning, we define formal conditions for a class of robotics problems where learning can occur in a simplified problem space and successfully transfer to a broader problem space. Exploiting these guarantees and relying on the discrete action space, we learn the swing-free policy in a subspace several orders of magnitude smaller, and later develop a method for swing-free trajectory planning along a path. As an extension to tasks in environments with static obstacles where the load displacement needs to be bounded throughout the trajectory, sampling-based motion planning generates collision-free paths. Next, a reinforcement learning agent transforms these paths into trajectories that maintain the bound on the load displacement while following the collision-free path in a timely manner. We verify the approach both in simulation and in experiments on a quadrotor with suspended load and verify the method's safety and feasibility through a demonstration where a quadrotor delivers an open container of liquid to a human subject. The contributions of this work are twofold. First, this article presents a solution to a challenging, and vital problem of planning a constraint-balancing task for an inherently unstable non-linear system in the presence of obstacles. Second, AI and robotics researchers can both benefit from the provided theoretical guarantees of system stability on a class of constraint-balancing tasks that occur in very large action spaces.
Abstract. Although there are many motion planning techniques, there is no method that outperforms all others for all problem instances. Rather, each technique has different strengths and weaknesses which makes it best-suited for certain types of problems. Moreover, since an environment can contain vastly different regions, there may not be a single planner that will perform well in all its regions. Ideally, one would use a suite of planners in concert and would solve the problem by applying the best-suited planner in each region.In this paper, we propose an automated framework for feature-sensitive motion planning. We use a machine learning approach to characterize and partition C-space into regions that are well suited to one of the methods in our library of roadmapbased motion planners. After the best-suited method is applied in each region, the resulting region roadmaps are combined to form a roadmap of the entire planning space. Over a range of problems, we demonstrate that our simple prototype system reliably outperforms any of the planners on their own.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.