A pursuit–evasion game is a classical maneuver confrontation problem in the multi-agent systems (MASs) domain. An online decision technique based on deep reinforcement learning (DRL) was developed in this paper to address the problem of environment sensing and decision-making in pursuit–evasion games. A control-oriented framework developed from the DRL-based multi-agent deep deterministic policy gradient (MADDPG) algorithm was built to implement multi-agent cooperative decision-making to overcome the limitation of the tedious state variables required for the traditionally complicated modeling process. To address the effects of errors between a model and a real scenario, this paper introduces adversarial disturbances. It also proposes a novel adversarial attack trick and adversarial learning MADDPG (A2-MADDPG) algorithm. By introducing an adversarial attack trick for the agents themselves, uncertainties of the real world are modeled, thereby optimizing robust training. During the training process, adversarial learning was incorporated into our algorithm to preprocess the actions of multiple agents, which enabled them to properly respond to uncertain dynamic changes in MASs. Experimental results verified that the proposed approach provides superior performance and effectiveness for pursuers and evaders, and both can learn the corresponding confrontational strategy during training.
Developing efficient motion policies for multiagents is a challenge in a decentralized dynamic situation, where each agent plans its own paths without knowing the policies of the other agents involved. This paper presents an efficient learning‐based motion planning method for multiagent systems. It adopts the framework of multiagent deep deterministic policy gradient (MADDPG) to directly map partially observed information to motion commands for multiple agents. To improve the efficiency of MADDPG in sample utilization, so as to train more brilliant agents that can adapt to more complex environments, a strategy named mixed experience (ME) is introduced to MADDPG, and this has led to our proposed ME‐MADDPG algorithm. The novel ME strategy can be embodied into three specific mechanisms: (1) an artificial potential field‐based sample generator to produce high‐quality samples in the early training stage; (2) a dynamic mixed sampling strategy to mix the training data from different sources with a variable proportion; (3) a delayed learning skill to stabilize the training of the multiple agents. A series of experiments have been conducted to verify the performance of the proposed ME‐MADDPG algorithm, and it has been demonstrated that, compared with MADDPG, the proposed algorithm can significantly improve the convergence speed and convergence effect in the training process, and it has also shown better efficiency and better adaptability in complex dynamic environments while it is used for multiagent motion planning applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.