2020
DOI: 10.1109/lra.2020.3011351
|View full text |Cite
|
Sign up to set email alerts
|

Actor-Critic Reinforcement Learning for Control With Stability Guarantee

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
50
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 83 publications
(50 citation statements)
references
References 25 publications
0
50
0
Order By: Relevance
“…In [28] multi pseudo Qlearning-based deterministic policy gradient algorithm was proposed to achieve high-level tracking control accuracy of AUVs, which validated that increasing the number of the actors and critics could further improve the performance. Recently, a data-based approach for analyzing the stability of discrete-time nonlinear stochastic systems modeled by Markov decision process, by using the classic Lyapunov's method in control theory [29]. Due to the limited exploration ability caused deterministic policy, high-speed autonomous drifting is addressed, using a closed-loop controller based on the deep RL algorithm soft actor critic (SAC) to control the steering angle and throttle of simulated vehicles in [30].…”
Section: Introduction a Related Workmentioning
confidence: 99%
“…In [28] multi pseudo Qlearning-based deterministic policy gradient algorithm was proposed to achieve high-level tracking control accuracy of AUVs, which validated that increasing the number of the actors and critics could further improve the performance. Recently, a data-based approach for analyzing the stability of discrete-time nonlinear stochastic systems modeled by Markov decision process, by using the classic Lyapunov's method in control theory [29]. Due to the limited exploration ability caused deterministic policy, high-speed autonomous drifting is addressed, using a closed-loop controller based on the deep RL algorithm soft actor critic (SAC) to control the steering angle and throttle of simulated vehicles in [30].…”
Section: Introduction a Related Workmentioning
confidence: 99%
“…[19] proposes a straightforward approach to construct the Lyapunov functions for nonlinear systems using DNNs. Recently, the asymptotic stability in model-free RL is given for robotic control tasks in [20]. Inspired by the works [19], [20], we will also parametrise the Lyapunov function as a DNN and learn the parameters from samples.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, the asymptotic stability in model-free RL is given for robotic control tasks in [20]. Inspired by the works [19], [20], we will also parametrise the Lyapunov function as a DNN and learn the parameters from samples. Thereafter, a new DRL algorithm based on soft actor-critic algorithm [17] that incorporates the Lyapunov boundedness condition in the objective function to be optimised is proposed.…”
Section: Introductionmentioning
confidence: 99%
“…It implies that the tracking error converges to zero on the sliding surface s L = 0, i.e. q L → q Lr and trueq̇Ltrueq̇Lr as t →∞. For the follower robot, the infinite-horizon return function serves as Lyapunov function for the reinforcement learning system (Kamalapurkar et al , 2017; Han et al , 2020), such that: where γ ∈ [0,1] is the discounting factor. The reward function (13) is included in the Lyapunov function and the reward is larger than zero through setting the term weights, i.e.…”
mentioning
confidence: 99%
“…For the follower robot, the infinite-horizon return function serves as Lyapunov function for the reinforcement learning system (Kamalapurkar et al , 2017; Han et al , 2020), such that: …”
mentioning
confidence: 99%