Reinforcement learning (RL) is an effective method for the design of robust controllers of unknown nonlinear systems. Normal RLs for robust control, such as actor-critic (AC) algorithms, depend on the estimation accuracy. Uncertainty in the worst case requires a large state-action space, this causes overestimation and computational problems. In this article, the RL method is modified with the k-nearest neighbor and the double Q-learning algorithm. The modified RL does not need the neural estimator as AC and can stabilize the unknown nonlinear system under the worst-case uncertainty. The convergence property of the proposed RL method is analyzed. The simulations and the experimental results show that our modified RLs are much more robust compared with the classic controllers, such as the proportional-integral-derivative, the sliding mode, and the optimal linear quadratic regulator controllers.
K E Y W O R D Sk-nearest neighbors, double estimator, overestimation, robust reward, state-action space, worst-case uncertainty
INTRODUCTIONThe objective of robust control is to achieve robust performance in presence of disturbances. Most robust controllers are inspired by the optimal control theory, such as 2 control, 1,2 which minimizes a certain cost function to find an optimal controller. The most popular 2 controller is the linear quadratic regulator (LQR). 3 It does not work well in presence of disturbances. The ∞ control can find a robust controller when the system has disturbances. Its performance is poor compared with the 2 control. 4 The combination of 2 and ∞ , called 2 ∕ ∞ control, has both advantages, that is., it has optimal performance with bounded disturbances. 5 The 2 ∕ ∞ controller design needs a complete knowledge of the system dynamics. 6 These controllers are model-based. The time-varying quadratic optimization can be calculated by the zeroing neural network. It can simultaneously achieve the finite-time convergence and inherent noise tolerance. 7 However, it requires prefect activation functions. Model-free controllers, like the proportional-integral-derivative (PID) control, 8,9 the sliding mode control (SMC), 10 neural control, 11-16 among others, do not require dynamic knowledge of the system. However, parameter tuning and some prior knowledge of the disturbances prevent these model-free controllers to perform optimally.Reinforcement learning (RL) 17,18 is another effective method without models. It is designed in the sense of 2 control. 19 The temporal difference (TD) rule, such as Q-learning, is applied to find an optimal solution for Markov decision processes. 20 The advantage of RL over the other model-free methods is that it can reach optimal performances. Recent results show that RL methods can learn 2 and ∞ controllers without system dynamics. 21
2920The main objective of RL for 2 and ∞ is to minimize the total accumulative reward. For a robust controller, the reward is designed in the sense of control problems 2 and ∞ . This robust reward can be static or dynam...