2022
DOI: 10.48550/arxiv.2202.00048
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Single Time-scale Actor-critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees

Abstract: We propose a single time-scale actor-critic algorithm to solve the linear quadratic regulator (LQR) problem. A least squares temporal difference (LSTD) method is applied to the critic and a natural policy gradient method is used for the actor. We give a proof of convergence with sample complexity O(ε −1 log(ε −1 ) 2 ). The method in the proof is applicable to general single time-scale bilevel optimization problem. We also numerically validate our theoretical results on the convergence.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 13 publications
0
6
0
Order By: Relevance
“…Before preceding, the following assumptions are required, which are standard in the theoretical analysis of AC methods (Wu et al 2020;Fu, Yang, and Wang 2020;Yang et al 2019;Zhou and Lu 2022).…”
Section: Main Theorymentioning
confidence: 99%
See 1 more Smart Citation
“…Before preceding, the following assumptions are required, which are standard in the theoretical analysis of AC methods (Wu et al 2020;Fu, Yang, and Wang 2020;Yang et al 2019;Zhou and Lu 2022).…”
Section: Main Theorymentioning
confidence: 99%
“…The zeroth-order methods and the policy iteration method are included for completeness. In particular, we note that Zhou and Lu (2022) analyzed the finite-time convergence under a single-timescale stepsize and multi-sample setting. The analysis requires the strong assumption on the uniform boundedness of the critic parameters.…”
Section: Introductionmentioning
confidence: 99%
“…Before preceding, the following assumptions are required, which are standard in the theoretical analysis of AC methods (Wu et al 2020;Fu, Yang, and Wang 2020;Yang et al 2019;Zhou and Lu 2022).…”
Section: Main Theorymentioning
confidence: 99%
“…Nevertheless, we will present numerical examples to support this assumption. Moreover, the assumption for the existence of stationary distribution is common and has been widely used in Zhou and Lu (2022); Olshevsky and Gharesifard (2022).…”
Section: Main Theorymentioning
confidence: 99%
“…For a general MDP, the best known sample complexity for both single-loop and nested-loop approaches is O(ϵ −2 ) (Chen et al, 2021;Olshevsky & Gharesifard, 2022;Xu et al, 2020a;Suttle et al, 2023). The only exception is (Zhou & Lu, 2022), which is due to the special structure of the LQR problem. These studies all use a fixed function class in the critic, and therefore, the convergence error consists of a non-diminishing constant term of ε critic .…”
Section: Related Workmentioning
confidence: 99%