2009
DOI: 10.1016/j.neunet.2009.06.014
|View full text |Cite
|
Sign up to set email alerts
|

Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
75
0

Year Published

2011
2011
2016
2016

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 181 publications
(75 citation statements)
references
References 6 publications
0
75
0
Order By: Relevance
“…where Z (t 0 ) ∈ R is a positive constant that depends on the initial condition of the system, the observer in (6) along with the adaptive update law in (9) and the controller in (19) along with the adaptive update laws in (24) and (26) ensure that the state x (t), the state estimation errorx (t), the parameter estimation errorθ (t), the value function weight estimation errorW c (t) and the policy weight estimation error W a (t) are UUB, resulting in UUB convergence of the policŷ u x (t) ,Ŵ a (t) to the optimal policy u * (x (t)).…”
Section: Theorem 1 Provided Assumptionsmentioning
confidence: 99%
“…where Z (t 0 ) ∈ R is a positive constant that depends on the initial condition of the system, the observer in (6) along with the adaptive update law in (9) and the controller in (19) along with the adaptive update laws in (24) and (26) ensure that the state x (t), the state estimation errorx (t), the parameter estimation errorθ (t), the value function weight estimation errorW c (t) and the policy weight estimation error W a (t) are UUB, resulting in UUB convergence of the policŷ u x (t) ,Ŵ a (t) to the optimal policy u * (x (t)).…”
Section: Theorem 1 Provided Assumptionsmentioning
confidence: 99%
“…In order to extend the validity of the results to different initial conditions within a pre-selected set, in [9] a solution is found as the local optimum in the sense that it minimizes the worst possible cost for all trajectories starting in the selected initial states set. In the past two decades, approximate dynamic programming (ADP) has been shown to have a lot of promise in providing comprehensive solutions to conventional optimal control problems in a feedback form [17][18][19][20][21][22][23][24][25][26][27][28]. ADP is usually carried out using two neural network (NN) syntheses called adaptive critics (ACs) [18][19][20].…”
Section: Introductionmentioning
confidence: 99%
“…Then, a second OLA is utilized that minimizes the HJB function based on the information provided by the first OLA. Knowledge of the internal system dynamics is not required while the control coefficient matrix alone is needed although it can be relaxed by introducing an additional OLA [11].…”
Section: Introductionmentioning
confidence: 99%