In this article, a safe deep reinforcement learning (DRL) control method based on a safe reward shaping method is proposed and applied to the constrained control for an electro-hydraulic servo system (EHSS). The proposed control method improves the safety of the constrained control for a nonlinear system with the minimal intervention to the optimization of the performance objective, while the convergence speed of the DRL process has accelerated. By introducing control barrier functions (CBFs) to the reward shaping, a CBF-based potential difference term is designed to shape the safe reward, which not only provides the safe guidance for the DRL process by encoding the safety constraints of the nonlinear system, but also considers effects of the complex safety transformation on the convergence process in the DRL. Then the safe reward-based DRL control method is presented to learn the optimal safety policy of position tracking for the EHSS with position error constraints by planning and optimizing the safety together with the performance objective. Theoretical analysis is given to demonstrate that the proposed control method with the safe reward can achieve the optimal safety performance for the constrained control system.Experimental results of the constrained control for the EHSS with system uncertainties and perturbations are also exhibited, to show that the proposed control method converges fast and performs safer and better than the conventional control methods.
K E Y W O R D Sconstrained control, control barrier function, deep reinforcement learning, electro-hydraulic servo system, safe reward shaping
INTRODUCTIONDeep reinforcement learning (DRL) has been well used for nonlinear systems with uncertainties and complex dynamics, such as controlling robots 1-3 and self-driving vehicles, 4,5 because of its powerful ability to capture features from high-dimensional data and learn complicated control policies. 6 Several classical DRL algorithms including asynchronous advantage actor-critic (A3C), 7 deep Q-network, 8 and deep deterministic policy gradient (DDPG), 9 have been developedMinling Wu and Lijun Liu are co-first authors.