Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence

Dierks, Travis; Thumati, Balaje T.; Jagannathan, S.

doi:10.1016/j.neunet.2009.06.014

Cited by 181 publications

(75 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where Z (t 0 ) ∈ R is a positive constant that depends on the initial condition of the system, the observer in (6) along with the adaptive update law in (9) and the controller in (19) along with the adaptive update laws in (24) and (26) ensure that the state x (t), the state estimation errorx (t), the parameter estimation errorθ (t), the value function weight estimation errorW c (t) and the policy weight estimation error W a (t) are UUB, resulting in UUB convergence of the policŷ u x (t) ,Ŵ a (t) to the optimal policy u * (x (t)).…”

Section: Theorem 1 Provided Assumptionsmentioning

confidence: 99%

Model-based reinforcement learning for approximate optimal regulation

2016

View full text Add to dashboard Cite

Abstract-In deterministic systems, reinforcement learningbased online approximate optimal control methods typically require a restrictive persistence of excitation (PE) condition for convergence. This paper presents a concurrent learningbased solution to the online approximate optimal regulation problem that eliminates the need for PE. The development is based on the observation that given a model of the system, the Bellman error, which quantifies the deviation of the system Hamiltonian from the optimal Hamiltonian, can be evaluated at any point in the state space. Further, a concurrent learning-based parameter identifier is developed to compensate for parametric uncertainty in the plant dynamics. Uniformly ultimately bounded (UUB) convergence of the system states to the origin, and UUB convergence of the developed policy to the optimal policy are established using a Lyapunov-based analysis.

show abstract

Section: Theorem 1 Provided Assumptionsmentioning

confidence: 99%

Model-based reinforcement learning for approximate optimal regulation

2016

View full text Add to dashboard Cite

show abstract

“…In order to extend the validity of the results to different initial conditions within a pre-selected set, in [9] a solution is found as the local optimum in the sense that it minimizes the worst possible cost for all trajectories starting in the selected initial states set. In the past two decades, approximate dynamic programming (ADP) has been shown to have a lot of promise in providing comprehensive solutions to conventional optimal control problems in a feedback form [17][18][19][20][21][22][23][24][25][26][27][28]. ADP is usually carried out using two neural network (NN) syntheses called adaptive critics (ACs) [18][19][20].…”

Section: Introductionmentioning

confidence: 99%

Optimal switching between controlled subsystems with free mode sequence

Heydari

Balakrishnan

2015

Neurocomputing

View full text Add to dashboard Cite

“…Then, a second OLA is utilized that minimizes the HJB function based on the information provided by the first OLA. Knowledge of the internal system dynamics is not required while the control coefficient matrix alone is needed although it can be relaxed by introducing an additional OLA [11].…”

Section: Introductionmentioning

confidence: 99%

Online optimal control of nonlinear discrete-time systems using approximate dynamic programming

Dierks¹,

Jagannathan

2011

J. Control Theory Appl.

Self Cite

View full text Add to dashboard Cite

In this paper, the optimal control of a class of general affine nonlinear discrete-time (DT) systems is undertaken by solving the Hamilton Jacobi-Bellman (HJB) equation online and forward in time. The proposed approach, referred normally as adaptive or approximate dynamic programming (ADP), uses online approximators (OLAs) to solve the infinite horizon optimal regulation and tracking control problems for affine nonlinear DT systems in the presence of unknown internal dynamics. Both the regulation and tracking controllers are designed using OLAs to obtain the optimal feedback control signal and its associated cost function. Additionally, the tracking controller design entails a feedforward portion that is derived and approximated using an additional OLA for steady state conditions. Novel update laws for tuning the unknown parameters of the OLAs online are derived. Lyapunov techniques are used to show that all signals are uniformly ultimately bounded and that the approximated control signals approach the optimal control inputs with small bounded error. In the absence of OLA reconstruction errors, an optimal control is demonstrated. Simulation results verify that all OLA parameter estimates remain bounded, and the proposed OLA-based optimal control scheme tunes itself to reduce the cost HJB equation.

show abstract

Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence

Cited by 181 publications

References 6 publications

Model-based reinforcement learning for approximate optimal regulation

Model-based reinforcement learning for approximate optimal regulation

Optimal switching between controlled subsystems with free mode sequence

Online optimal control of nonlinear discrete-time systems using approximate dynamic programming

Contact Info

Product

Resources

About