2021
DOI: 10.48550/arxiv.2109.14756
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

Abstract: We study a novel two-time-scale stochastic gradient method for solving optimization problems where the gradient samples are generated from a time-varying Markov random process parameterized by the underlying optimization variable. These time-varying samples make the stochastic gradient biased and dependent, which can potentially lead to the divergence of the iterates. To address this issue, we consider a two-time-scale update scheme, where one scale is used to estimate the true gradient from the Markovian samp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 27 publications
0
7
0
Order By: Relevance
“…In this work, we fill in this gap by properly selecting and balancing the step sizes through a four-time-scale analysis. Compared with the existing works in two-time-scale stochastic approximation/optimization (Doan, 2020;Hong et al, 2020;Zeng et al, 2021;Doan, 2021b,a;Chen et al, 2019) and analysis of actor-critic algorithms involving up to three time scales (Wu et al, 2020;Khodadadian et al, 2021), the additional time scale(s) of this work further complicates the analysis.…”
Section: Technical Challenges and Solution Sketchmentioning
confidence: 95%
See 1 more Smart Citation
“…In this work, we fill in this gap by properly selecting and balancing the step sizes through a four-time-scale analysis. Compared with the existing works in two-time-scale stochastic approximation/optimization (Doan, 2020;Hong et al, 2020;Zeng et al, 2021;Doan, 2021b,a;Chen et al, 2019) and analysis of actor-critic algorithms involving up to three time scales (Wu et al, 2020;Khodadadian et al, 2021), the additional time scale(s) of this work further complicates the analysis.…”
Section: Technical Challenges and Solution Sketchmentioning
confidence: 95%
“…Using this property, different variants of policy gradient methods have been shown to return a global optimal policy (Agarwal et al, 2020;Mei et al, 2020). For the linear quadratic regulator (LQR) problem, the stronger Polyak-Lojasiewicz con-dition has been shown to hold and has been used to provide convergence guarantees for the policy gradient method (Fazel et al, 2018;Yang et al, 2019;Zeng et al, 2021). Online actor-critic methods for unconstrained MDPs have also been studied in both the tabular (Khodadadian et al, 2021) and linear function approximation (Wu et al, 2020) settings.…”
Section: Related Workmentioning
confidence: 99%
“…Several critic steps are performed between two actor steps and their final sample complexity is O(ε −5 ). Zeng et al (2021) study a bilevel optimization problem which is applied to two time-scale actor-critic algorithm on LQR. They obtain a complexity of O(ε −3/2 ).…”
Section: Related Workmentioning
confidence: 99%
“…The major difficulty of a bilevel optimization problem is that when the lower level problem is not solved exactly, the error could propagate to the higher level problem and accumulate in the algorithm. One approach to overcome this problem is the two time-scale method (Konda and Tsitsiklis, 2000;Wu et al, 2020;Zeng et al, 2021), where the update of lower level problem is in a time scale that is much faster than the higher level one. This method suffers from high computational cost because of the lower level optimization.…”
Section: Introductionmentioning
confidence: 99%
“…Technical Approach. The key idea in our analysis is to introduce a new composite Lyapunov function with respect to the time-scale separation in the network, inspired by singular perturbation theory [15] and recent analysis for the centralized two-time-scale stochastic approximation [16], [17], [18]. Our approach is different than the typical singular perturbation approach reported in [5], [10], [7] in that it is not based on reducing the system model into two smaller models.…”
Section: Introductionmentioning
confidence: 99%