This paper proposes a novel guidance law for intercepting a high-speed maneuvering target based on deep reinforcement learning, which mainly includes the interceptor–target relative motion model and value function approximation model based on deep Q-Network (DQN) with prioritized experience replay. First, a method called prioritized experience replay is applied to extract more efficient samples and reduce the training time. Second, to cope with the discrete action space of DQN, a normal acceleration is introduced to the state space, and the normal acceleration rate is chosen as the action. Then, the continuous normal acceleration command is obtained using numerical integral method. Third, to make the line-of-sight (LOS) rate converge rapidly, the reward function whose absolute value tends to zero has been constructed. Finally, compared with proportional navigation guidance (PNG) and the Q-Learning-based guidance law (QLG), the simulation experiments are implemented to intercept high-speed maneuvering targets at different acceleration policies. Simulation results demonstrate that the proposed DQN-based guidance law (DQNG) can obtain continuous acceleration command, make the LOS rate converge to zero rapidly, and hit the maneuvering targets using only the LOS rate. It also confirms that DQNG can realize the parallel-like approach and improve the interception performance of the interceptor to high-speed maneuvering targets. The proposed DQNG also has the advantages of avoiding the complicated formula derivation of traditional guidance law and eliminates the acceleration buffeting.
In this paper, an intelligent guidance law based on Deep Q Network (DQN) algorithm is proposed, for enabling the missile to intercept different maneuvering targets following the idea of the parallel-approach method. In specific, we propose the inverse ratio of the absolute value of line-of-sight (LOS) angle rate as the shaping reward function which guarantees the successful finding of the control strategy and the speeding up of the training process of the reinforcement learning (RL) model. Furthermore, to avoid rapid chattering caused by directly choosing missile acceleration as an action, we introduce the change rate of the acceleration as the action in the DQN algorithm and integrate it to obtain the acceleration command. Therefore, in our algorithm only LOS angle, LOS angle rate, and missile overload information are used in the established RL model to generate the guidance command, which is easy to implement. The simulation results and comparative experiments demonstrate that the proposed RL based guidance method achieves better guidance accuracy and higher success rate. The great performance of the proposed method suggests that the RL based guidance method is promising for the maneuvering target, and deserve further investigations in future.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.