Beyond-visual-range (BVR) engagement becomes more and more popular in the modern air battlefield. The key and difficulty for pilots in the fight is maneuver planning, which reflects the tactical decision-making capacity of the both sides and determinates success or failure. In this paper, we propose an intelligent maneuver planning method for BVR combat with using an improved deep Q network (DQN). First, a basic combat environment builds, which mainly includes flight motion model, relative motion model and missile attack model. Then, we create a maneuver decision framework for agent interaction with the environment. Basic perceptive variables are constructed for agents to form continuous state space. Also, considering the threat of each side missile and the constraint of airfield, the reward function is designed for agents to training. Later, we introduce a training algorithm and propose perceptional situation layers and value fitting layers to replace policy network in DQN. Based on long short-term memory (LSTM) cell, the perceptional situation layer can convert basic state to high-dimensional perception situation. The fitting layer does well in mapping action. Finally, three combat scenarios are designed for agent training and testing. Simulation result shows the agent can avoid the threat of enemy and gather own advantages to threat the target. It also proves the models and methods of agents are valid and intelligent air combat can be realized.
Aiming at the problem of the lack of intelligence of virtual machine opponents in the human-machine confrontation semi-physical simulation environment, it is proposed to apply the deep reinforcement learning method into tactical making-decision for building an AI virtual pilot with self-confrontation and self-learning ability. First, flight dynamics and kinematics are used to build basic flight models in the simulation environment, and a missile attack area is established for weapon model; Second, inspired by the framework of interaction between the agent and the environment in reinforcement learning, a tactical decision architecture for flight agent based on the one-to-one tactical confrontation process is organized. Finally, the improved DQN method is used to fit the value function in the continuous state space, and the network training is completed by means of agent self-antagonism and human-machine confrontation. the well-trained AI model can undertake the role of virtual opponents in human-machine confrontation environment, and shows a certain degree of intelligence in the confrontation process with pilots.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.