2016
DOI: 10.1016/j.automatica.2015.10.039
|View full text |Cite
|
Sign up to set email alerts
|

Model-based reinforcement learning for approximate optimal regulation

Abstract: Abstract-In deterministic systems, reinforcement learningbased online approximate optimal control methods typically require a restrictive persistence of excitation (PE) condition for convergence. This paper presents a concurrent learningbased solution to the online approximate optimal regulation problem that eliminates the need for PE. The development is based on the observation that given a model of the system, the Bellman error, which quantifies the deviation of the system Hamiltonian from the optimal Hamilt… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 169 publications
(19 citation statements)
references
References 36 publications
0
19
0
Order By: Relevance
“…Similar to [36], the technique developed in this result implements simulation of experience in a model-based RL scheme by using the system model to extrapolate the approximate BE to unexplored areas of the state space. In the following, the trajectories of the state and the weight estimates W c and W a , evaluated at time t starting from appropriate initial conditions are denoted by x (t), W c (t) and W a (t), respectively.…”
Section: Velocity Estimator Designmentioning
confidence: 99%
“…Similar to [36], the technique developed in this result implements simulation of experience in a model-based RL scheme by using the system model to extrapolate the approximate BE to unexplored areas of the state space. In the following, the trajectories of the state and the weight estimates W c and W a , evaluated at time t starting from appropriate initial conditions are denoted by x (t), W c (t) and W a (t), respectively.…”
Section: Velocity Estimator Designmentioning
confidence: 99%
“…Concurrent learning (CL) [29][30][31], claims to relax the PE condition by storing information-rich past data along the system trajectory and concurrently using it along with the instantaneous data in the estimator design. A rank condition on a matrix, formed out of stored data, is sufficient to guarantee parameter convergence in the CL-based methods.…”
Section: Analytical Comparison With Existing Literaturementioning
confidence: 99%
“…Recent works [28][29][30] on learning and data-driven control methods have shown promise in improving tracking performance by using stored input-output data along the system trajectory which carries sufficient information about the unknown parameters. Girish et al [29,31] proposed a novel approach, coined as concurrent learning (CL) based model reference adaptive control (MRAC), where information-rich past data is stored and concurrently used along with gradient based parameter update laws.…”
Section: Introductionmentioning
confidence: 99%
“…for all Z p such that Z p ∈ χ p and Z p ≥ v −1 lp (ι p ). Using the bounds in (40), the sufficient conditions in (41)- (43), and the inequality in (46), Theorem 4.18 in [47] can be invoked to conclude that every trajectory…”
Section: Stability Analysismentioning
confidence: 99%