Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems

Škach, Jan; Kiumarsi, Bahare; Lewis, Frank L.; Straka, Ondřej

doi:10.1109/tcyb.2016.2618926

Cited by 53 publications

(11 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The RBF‐NN approximation requires a lot of prior information for the unknown nonlinear system because they use the localizable functions. When the above information is not available, the actor‐critic (AC) methods use two estimators, such as neural networks, to complete the policy search work . However, the control depends on the accuracy of the neural estimators .…”

Section: Introductionmentioning

confidence: 99%

Robust control under worst‐case uncertainty for unknown nonlinear systems using modified reinforcement learning

Perrusquía

2020

Intl J Robust & Nonlinear

View full text Add to dashboard Cite

Reinforcement learning (RL) is an effective method for the design of robust controllers of unknown nonlinear systems. Normal RLs for robust control, such as actor-critic (AC) algorithms, depend on the estimation accuracy. Uncertainty in the worst case requires a large state-action space, this causes overestimation and computational problems. In this article, the RL method is modified with the k-nearest neighbor and the double Q-learning algorithm. The modified RL does not need the neural estimator as AC and can stabilize the unknown nonlinear system under the worst-case uncertainty. The convergence property of the proposed RL method is analyzed. The simulations and the experimental results show that our modified RLs are much more robust compared with the classic controllers, such as the proportional-integral-derivative, the sliding mode, and the optimal linear quadratic regulator controllers. K E Y W O R D Sk-nearest neighbors, double estimator, overestimation, robust reward, state-action space, worst-case uncertainty INTRODUCTIONThe objective of robust control is to achieve robust performance in presence of disturbances. Most robust controllers are inspired by the optimal control theory, such as  2 control, 1,2 which minimizes a certain cost function to find an optimal controller. The most popular  2 controller is the linear quadratic regulator (LQR). 3 It does not work well in presence of disturbances. The  ∞ control can find a robust controller when the system has disturbances. Its performance is poor compared with the  2 control. 4 The combination of  2 and  ∞ , called  2 ∕ ∞ control, has both advantages, that is., it has optimal performance with bounded disturbances. 5 The  2 ∕ ∞ controller design needs a complete knowledge of the system dynamics. 6 These controllers are model-based. The time-varying quadratic optimization can be calculated by the zeroing neural network. It can simultaneously achieve the finite-time convergence and inherent noise tolerance. 7 However, it requires prefect activation functions. Model-free controllers, like the proportional-integral-derivative (PID) control, 8,9 the sliding mode control (SMC), 10 neural control, 11-16 among others, do not require dynamic knowledge of the system. However, parameter tuning and some prior knowledge of the disturbances prevent these model-free controllers to perform optimally.Reinforcement learning (RL) 17,18 is another effective method without models. It is designed in the sense of  2 control. 19 The temporal difference (TD) rule, such as Q-learning, is applied to find an optimal solution for Markov decision processes. 20 The advantage of RL over the other model-free methods is that it can reach optimal performances. Recent results show that RL methods can learn  2 and  ∞ controllers without system dynamics. 21 2920The main objective of RL for  2 and  ∞ is to minimize the total accumulative reward. For a robust controller, the reward is designed in the sense of control problems  2 and  ∞ . This robust reward can be static or dynam...

show abstract

Section: Introductionmentioning

confidence: 99%

Robust control under worst‐case uncertainty for unknown nonlinear systems using modified reinforcement learning

Perrusquía

2020

Intl J Robust & Nonlinear

View full text Add to dashboard Cite

show abstract

“…This problem was solved in work [22] by applying an adaptive controller with second level adaptation. This solution was further studied for nonlinear system with linear parametrization [23], fractional system [24], observer design [25], [26] and artificial intelligence [27], [28]. The extension of this technique was also proposed in work [29] with second level adaptation based on error integration.…”

Section: Introductionmentioning

confidence: 99%

Reset Strategy for Output Feedback Multiple Models MRAC Applied to DEAP

2020

View full text Add to dashboard Cite

The smart actuators are rapidly developing in the recent years. Dielectric Electroactive Polymer actuators are very important smart actuators due to their features like softness, high force ratio, fast operation and silence. In recent year a set of dynamic models for DEAP actuators have been developed by various authors. Relying on these models it is possible to design an wide range of feedback controllers. In our work, we develop the indirect adaptive controller for Dielectric Electroactive Polymer actuator exploiting the multiple models approach with second layer adaptation. The results presented in this paper prove that in the case of piecewise continuous parameters, the benefits of second level adaptation can be lost. To solve this problem, a new resetting algorithm is proposed. The efficiency of the proposed control method is verified by a simulation on a simple motivation example and DEAP actuator model.

show abstract

“…The Actor-Critic method, which combines the value-based method and the policy-based method, adopts policy-based method to update the policy, and adopts the value function as the evaluation method of the policy [26,27,28]. By introducing the value function as the evaluation criterion in the policy search, the loss of sequential difference about the reward can be minimized, so that the variance of the policy gradient estimation can be reduced effectively.…”

Section: Introductionmentioning

confidence: 99%

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System

Zhang

Xiong

et al. 2019

Sensors

View full text Add to dashboard Cite

According to the existing mainstream automatic parking system (APS), a parking path is first planned based on the parking slot detected by the sensors. Subsequently, the path tracking module guides the vehicle to track the planned parking path. However, since the vehicle is non-linear dynamic, path tracking error inevitably occurs, leading to inclination and deviation of the parking. Accordingly, in this paper, a reinforcement learning-based end-to-end parking algorithm is proposed to achieve automatic parking. The vehicle can continuously learn and accumulate experience from numerous parking attempts and then learn the command of the optimal steering wheel angle at different parking slots. Based on this end-to-end parking, errors caused by path tracking can be avoided. Moreover, to ensure that the parking slot can be obtained continuously in the process of learning, a parking slot tracking algorithm is proposed based on the combination of vision and vehicle chassis information. Furthermore, given that the learning network output is hard to converge, and it is easy to fall into local optimum during the parking process, several reinforcement learning training methods in terms of parking conditions are developed. Lastly, by the real vehicle test, it is proved that using the proposed method can achieve a better parking attitude than using the path planning and path tracking-based method.

show abstract

Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems

Cited by 53 publications

References 37 publications

Robust control under worst‐case uncertainty for unknown nonlinear systems using modified reinforcement learning

Robust control under worst‐case uncertainty for unknown nonlinear systems using modified reinforcement learning

Reset Strategy for Output Feedback Multiple Models MRAC Applied to DEAP

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System

Contact Info

Product

Resources

About