2018
DOI: 10.1109/tcyb.2016.2618926
|View full text |Cite
|
Sign up to set email alerts
|

Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems

Abstract: In this paper, motivated by human neurocognitive experiments, a model-free off-policy reinforcement learning algorithm is developed to solve the optimal tracking control of multiple-model linear discrete-time systems. First, an adaptive self-organizing map neural network is used to determine the system behavior from measured data and to assign a responsibility signal to each of system possible behaviors. A new model is added if a sudden change of system behavior is detected from the measured data and the behav… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
11
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 53 publications
(11 citation statements)
references
References 37 publications
0
11
0
Order By: Relevance
“…The RBF‐NN approximation requires a lot of prior information for the unknown nonlinear system because they use the localizable functions. When the above information is not available, the actor‐critic (AC) methods use two estimators, such as neural networks, to complete the policy search work . However, the control depends on the accuracy of the neural estimators .…”
Section: Introductionmentioning
confidence: 99%
“…The RBF‐NN approximation requires a lot of prior information for the unknown nonlinear system because they use the localizable functions. When the above information is not available, the actor‐critic (AC) methods use two estimators, such as neural networks, to complete the policy search work . However, the control depends on the accuracy of the neural estimators .…”
Section: Introductionmentioning
confidence: 99%
“…This problem was solved in work [22] by applying an adaptive controller with second level adaptation. This solution was further studied for nonlinear system with linear parametrization [23], fractional system [24], observer design [25], [26] and artificial intelligence [27], [28]. The extension of this technique was also proposed in work [29] with second level adaptation based on error integration.…”
Section: Introductionmentioning
confidence: 99%
“…The Actor-Critic method, which combines the value-based method and the policy-based method, adopts policy-based method to update the policy, and adopts the value function as the evaluation method of the policy [26,27,28]. By introducing the value function as the evaluation criterion in the policy search, the loss of sequential difference about the reward can be minimized, so that the variance of the policy gradient estimation can be reduced effectively.…”
Section: Introductionmentioning
confidence: 99%