2020
DOI: 10.2139/ssrn.3734179
|View full text |Cite
|
Sign up to set email alerts
|

Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon

Abstract: We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem. In particular we consider the convergence of policy gradient methods in the setting of known and unknown parameters. We are able to produce a global linear convergence guarantee for this approach in the setting of finite time horizon and stochastic state dynamics under weak assumptions. The convergence of a projected policy gradient method is also established in order to handle problems wit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 8 publications
(18 citation statements)
references
References 22 publications
0
18
0
Order By: Relevance
“…3) with optimal K t = K \ast t as given by (2.5). We have the following result on the well-definedness of P K K K t , the proof of which can be found in [29].…”
Section: Related Workmentioning
confidence: 93%
See 4 more Smart Citations
“…3) with optimal K t = K \ast t as given by (2.5). We have the following result on the well-definedness of P K K K t , the proof of which can be found in [29].…”
Section: Related Workmentioning
confidence: 93%
“…To utilize Lemmas 3.6 and 3.7 in the proof of Theorem 3.3, we need to further bound P t and \Sigma K K K , which is provided below in Lemma 3.8. The proof can be found in [29].…”
Section: Related Workmentioning
confidence: 98%
See 3 more Smart Citations