Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
With the development of unmanned aircraft and artificial intelligence technology, the future of air combat is moving towards unmanned and autonomous direction. In this paper, we introduce a new layered decision framework designed to address the six-degrees-of-freedom (6-DOF) aircraft within-visual-range (WVR) air-combat challenge. The decision-making process is divided into two layers, each of which is addressed separately using reinforcement learning (RL). The upper layer is the combat policy, which determines maneuvering instructions based on the current combat situation (such as altitude, speed, and attitude). The lower layer control policy then uses these commands to calculate the input signals from various parts of the aircraft (aileron, elevator, rudder, and throttle). Among them, the control policy is modeled as a Markov decision framework, and the combat policy is modeled as a partially observable Markov decision framework. We describe the two-layer training method in detail. For the control policy, we designed rewards based on expert knowledge to accurately and stably complete autonomous driving tasks. At the same time, for combat policy, we introduce a self-game-based course learning, allowing the agent to play against historical policies during training to improve performance. The experimental results show that the operational success rate of the proposed method against the game theory baseline reaches 85.7%. Efficiency was also outstanding, with an average 13.6% reduction in training time compared to the RL baseline.
With the development of unmanned aircraft and artificial intelligence technology, the future of air combat is moving towards unmanned and autonomous direction. In this paper, we introduce a new layered decision framework designed to address the six-degrees-of-freedom (6-DOF) aircraft within-visual-range (WVR) air-combat challenge. The decision-making process is divided into two layers, each of which is addressed separately using reinforcement learning (RL). The upper layer is the combat policy, which determines maneuvering instructions based on the current combat situation (such as altitude, speed, and attitude). The lower layer control policy then uses these commands to calculate the input signals from various parts of the aircraft (aileron, elevator, rudder, and throttle). Among them, the control policy is modeled as a Markov decision framework, and the combat policy is modeled as a partially observable Markov decision framework. We describe the two-layer training method in detail. For the control policy, we designed rewards based on expert knowledge to accurately and stably complete autonomous driving tasks. At the same time, for combat policy, we introduce a self-game-based course learning, allowing the agent to play against historical policies during training to improve performance. The experimental results show that the operational success rate of the proposed method against the game theory baseline reaches 85.7%. Efficiency was also outstanding, with an average 13.6% reduction in training time compared to the RL baseline.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.