2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton) 2018
DOI: 10.1109/allerton.2018.8636078
|View full text |Cite
|
Sign up to set email alerts
|

Concentration bounds for two time scale stochastic approximation

Abstract: Viewing a two time scale stochastic approximation scheme as a noisy discretization of a singularly perturbed differential equation, we obtain a concentration bound for its iterates that captures its behavior with quantifiable high probability. This uses Alekseev's nonlinear variation of constants formula and a martingale concentration inequality, and extends the corresponding results for single time scale stochastic approximation.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(19 citation statements)
references
References 13 publications
(29 reference statements)
0
19
0
Order By: Relevance
“…Actor-critic method was first proposed in [20] as a two-time-scale stochastic approximation [5,6,9,19] variant of the policy gradient algorithm [37], where a faster time scale is used to collect samples for gradient estimation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Actor-critic method was first proposed in [20] as a two-time-scale stochastic approximation [5,6,9,19] variant of the policy gradient algorithm [37], where a faster time scale is used to collect samples for gradient estimation.…”
Section: Related Workmentioning
confidence: 99%
“…The term ξ t /t is due to the variation of t over time, and the term β t is due to the exponential weight update of policy π t in Algorithm 1. Hence, by definition of θ t in (6) and by the update (9), and due to the Lipschitzness of Q π function, we have…”
Section: T T +1mentioning
confidence: 99%
“…Notable examples are [7,8] which prove asymptotic convergence of TD(λ). Recently, finite-time performance of single-agent stochastic approximation and TD algorithms has been studied in [9][10][11][12][13][14][15][16][17]; many other works have now appeared that perform finitetime analysis for other RL algorithms, see, e.g., [18][19][20][21][22][23][24][25][26][27][28], just to name a few. Many distributed reinforcement learning algorithms have now been proposed in the literature.…”
Section: Related Workmentioning
confidence: 99%
“…We end by pointing at some recent papers that build upon the ideas discussed here, thereby illustrating the usefulness of this work. In [14] and [8], concentration bounds have been obtained for two-timescale SA; the first one deals with the linear case, while the second one handles the generic nonlinear setup. Separately, [24] studies constant stepsize SA used to track a slowly moving target and provides bounds on the tracking error.…”
Section: Sincementioning
confidence: 99%