Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning

Chen, Yu Fan; Miao, Laicheng; Everett, Michael; How, Jonathan P.

doi:10.1109/icra.2017.7989037

Cited by 493 publications

(393 citation statements)

References 22 publications

Supporting

Mentioning

392

Contrasting

Unclassified

Order By: Relevance

“…Current stateof-the-art optimal planners can plan for several hundreds of agents, and the community is now settling for bounded suboptimal planners as a potential solution for even larger multi-agent systems [3], [9]. Another common approach is to rely on reactive planners, which do not plan joint paths for all agents before execution, but rather correct individual paths online to avoid collisions [5], [10]. However, such planners often prove inefficient in cluttered factory environments (such as Fig.…”

Section: Introductionmentioning

confidence: 99%

PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning

Sartoretti

Kerr

Shi

et al. 2019

IEEE Robot. Autom. Lett.

260

180

View full text Add to dashboard Cite

Multi-agent path finding (MAPF) is an essential component of many large-scale, real-world robot deployments, from aerial swarms to warehouse automation. However, despite the community's continued efforts, most state-of-the-art MAPF planners still rely on centralized planning and scale poorly past a few hundred agents. Such planning approaches are maladapted to real-world deployments, where noise and uncertainty often require paths be recomputed online, which is impossible when planning times are in seconds to minutes. We present PRIMAL, a novel framework for MAPF that combines reinforcement and imitation learning to teach fully-decentralized policies, where agents reactively plan paths online in a partially-observable world while exhibiting implicit coordination. This framework extends our previous work on distributed learning of collaborative policies by introducing demonstrations of an expert MAPF planner during training, as well as careful reward shaping and environment sampling. Once learned, the resulting policy can be copied onto any number of agents and naturally scales to different team sizes and world dimensions. We present results on randomized worlds with up to 1024 agents and compare success rates against state-of-theart MAPF planners. Finally, we experimentally validate the learned policies in a hybrid simulation of a factory mockup, involving both real-world and simulated robots.

show abstract

Section: Introductionmentioning

confidence: 99%

PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning

Sartoretti

Kerr

Shi

et al. 2019

IEEE Robot. Autom. Lett.

260

180

View full text Add to dashboard Cite

show abstract

“…This RL framework applies a reward function, R col s jn , u , to penalize the agent in case of collision, and reward in case of reaching its goal. Two different types of RL algorithms are used in this RL framework, value-based [22], [15] and policybased [14] learning. Value-based algorithm assumes that other agents continue their current velocities until next step, ∆t, to be able to extract policy from the value function, V s jn t .…”

Section: A Collision Avoidance With Deep Rl (Ga3c-cadrl)mentioning

confidence: 99%

Multi-Agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning

Semnani

Liu

Everett

et al. 2020

IEEE Robot. Autom. Lett.

Self Cite

View full text Add to dashboard Cite

This paper introduces a hybrid algorithm of deep reinforcement learning (RL) and Force-based motion planning (FMP) to solve distributed motion planning problem in dense and dynamic environments. Individually, RL and FMP algorithms each have their own limitations. FMP is not able to produce time-optimal paths and existing RL solutions are not able to produce collision-free paths in dense environments. Therefore, we first tried improving the performance of recent RL approaches by introducing a new reward function that not only eliminates the requirement of a pre supervised learning (SL) step but also decreases the chance of collision in crowded environments. That improved things, but there were still a lot of failure cases. So, we developed a hybrid approach to leverage the simpler FMP approach in stuck, simple and high-risk cases, and continue using RL for normal cases in which FMP can't produce optimal path. Also, we extend GA3C-CADRL algorithm to 3D environment. Simulation results show that the proposed algorithm outperforms both deep RL and FMP algorithms and produces up to 50% more successful scenarios than deep RL and up to 75% less extra time to reach goal than FMP.

show abstract

“…In addition, a study by Namazi et al [34] shows that traditional machine learning-based solutions are not suitable for a complex and dynamic environment such as autonomous driving. Leveraging deep learning especially the Convolutional Neural Networks (CNNs), Lv et al [35] handled collision avoidance by predicting the traffic flow while Chen et al [10] utilized DRL with multi-agents settings to avoid collisions. In addition, Cheng et al [36] formulated an automated enemy avoidance problem with Markov Decision Process and resolved it with temporal-difference reinforcement learning.…”

Section: A Collision Avoidancementioning

confidence: 99%

“…The high computational burden of an optimization-based centralized scheme makes the deployment of the control system on real platforms challenging. On the other hand, Chen et al [10] developed a decentralized multi-agent collision avoidance algorithm where two agents were simulated to navigate toward their own goal positions and learn a value network that encodes the expected time to goal. However, cooperative information among robots is not accounted for in the solution and the design is not suitable for high speed scenarios.…”

Section: B Multi-agent Collision Avoidancementioning

confidence: 99%

RACE: Reinforced Cooperative Autonomous Vehicle Collision Avoidance

Yuan

Tasik

Adhatarao

et al. 2020

IEEE Trans. Veh. Technol.

View full text Add to dashboard Cite

With the rapid development of autonomous driving, collision avoidance has attracted attention from both academia and industry. Many collision avoidance strategies have emerged in recent years, but the dynamic and complex nature of driving environment poses a challenge to develop robust collision avoidance algorithms. Therefore, in this paper, we propose a decentralized framework named RACE: Reinforced Cooperative Autonomous Vehicle Collision AvoidancE. Leveraging a hierarchical architecture we develop an algorithm named Co-DDPG to efficiently train autonomous vehicles. Through a security abiding channel, the autonomous vehicles distribute their driving policies. We use the relative distances obtained by the opponent sensors to build the VANET instead of locations, which ensures the vehicle's location privacy. With a leader-follower architecture and parameter distribution, RACE accelerates the learning of optimal policies and efficiently utilizes the remaining resources. We implement the RACE framework in the widely used TORCS simulator and conduct various experiments to measure the performance of RACE. Evaluations show that RACE quickly learns optimal driving policies and effectively avoids collisions. Moreover, RACE also scales smoothly with varying number of participating vehicles. We further compared RACE with existing autonomous driving systems and show that RACE outperforms them by experiencing 65% less collisions in the training process and exhibits improved performance under varying vehicle density.

show abstract

Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning

Cited by 493 publications

References 22 publications

PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning

PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning

Multi-Agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning

RACE: Reinforced Cooperative Autonomous Vehicle Collision Avoidance

Contact Info

Product

Resources

About