Reinforcement learning (RL) has been successfully applied to motion control, without requiring accurate models and selection of control parameters. In this paper, we propose a novel RL algorithm based on proximal policy optimization algorithm with dimension-wise clipping (PPO-DWC) for attitude control of quadrotor. Firstly, dimension-wise clipping technique is introduced to solve the zerogradient problem of the PPO algorithm, which can quickly converge while maintaining good sampling efficiency, thus improving the control performance. Moreover, following the idea of stability augmentation system (SAS), a feedback controller is designed and integrated into the environment before training the PPO controller to avoid ineffective exploration and improve the system's convergence. The eventual controller consists of two parts: the first is the result of the actor neural network in the PPO algorithm, and the second is the output of the stability augmentation feedback controller. Both of them directly use an endto-end style of control commands to map the system state. This control architecture is applied in the attitude control of the quadrotor. The simulation results show that the quadrotor can quickly and accurately track the command and has a small steady-state error after the training by the improved PPO algorithm. Meanwhile, compared with the traditional PID controller and basic PPO algorithm, the proposed PPO-DWC algorithm with stability augmentation framework has better performance in tracking accuracy and robustness.
In this paper, a novel deep reinforcement learning algorithm based on Proximal Policy Optimization (PPO) is proposed to achieve the fixed point flight control of a quadrotor. The attitude and position information of the quadrotor is directly mapped to the PWM signals of the four rotors through neural network control. To constrain the size of policy updates, a PPO algorithm based on Monte Carlo approximations is proposed to achieve the optimal penalty coefficient. A policy optimization method with a penalized point probability distance can provide the diversity of policy by performing each policy update. The new proxy objective function is introduced into the actor–critic network, which solves the problem of PPO falling into local optimization. Moreover, a compound reward function is presented to accelerate the gradient algorithm along the policy update direction by analyzing various states that the quadrotor may encounter in the flight, which improves the learning efficiency of the network. The simulation tests the generalization ability of the offline policy by changing the wing length and payload of the quadrotor. Compared with the PPO method, the proposed method has higher learning efficiency and better robustness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.