2014
DOI: 10.1007/s11432-013-4954-y
|View full text |Cite
|
Sign up to set email alerts
|

A new self-learning optimal control laws for a class of discrete-time nonlinear systems based on ESN architecture

Abstract: A novel self-learning optimal control method for a class of discrete-time nonlinear systems is proposed based on iteration adaptive dynamic programming (ADP) algorithm. It is proven that the iteration costate functions converge to the optimal one, and a detailed convergence analysis of the iteration ADP algorithm is given. Furthermore, echo state network (ESN) architecture is used as the approximator of the costate function for each iteration. To ensure the reliability of the ESN approximator, the ESN mean squ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
8
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 22 publications
(8 citation statements)
references
References 35 publications
0
8
0
Order By: Relevance
“…Dynamic Reservoir contains a large number of sparsely connected neurons, which mimics the working principle of the human brain neurons. They accept information from the input layer just as the human brain neurons receive stimuli from the outside world [14].…”
Section: Basic Esnmentioning
confidence: 99%
“…Dynamic Reservoir contains a large number of sparsely connected neurons, which mimics the working principle of the human brain neurons. They accept information from the input layer just as the human brain neurons receive stimuli from the outside world [14].…”
Section: Basic Esnmentioning
confidence: 99%
“…Many approaches have been proposed to obtain the approximate solution of the HJB equation such as the adaptive dynamic programming (DP). This technique is classified into several schemes including the heuristic DP [10], dual heuristic DP [11], action-dependent DP, and Q-learning DP [12]. However, the DP policy is not computationally tenable to run and solves the time-varying HJB equations [13].…”
Section: Introductionmentioning
confidence: 99%
“…Meanwhile, for traditional ADP methods, the solution to infinite‐horizon optimal control of discrete‐time nonlinear systems is based on value iterations and policy iterations . The training requires large number of iterations.…”
Section: Introductionmentioning
confidence: 99%