2020
DOI: 10.48550/arxiv.2004.12956
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

4
34
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(38 citation statements)
references
References 22 publications
4
34
0
Order By: Relevance
“…An immediate future work is to employ our analysis for a sample based setting implementation of NPG, also know as Natural actor-critic. Recently, there has been a line of work on the analysis of actor-critic type algorithms [23,28,32,33,14,13], where [13] characterizes the best convergence result of O(1/k 1/3 ). By employing the improved convergence rate of NPG proposed in this paper, we believe that it is possible to improve the rate of the stochastic variant.…”
Section: Discussionmentioning
confidence: 99%
“…An immediate future work is to employ our analysis for a sample based setting implementation of NPG, also know as Natural actor-critic. Recently, there has been a line of work on the analysis of actor-critic type algorithms [23,28,32,33,14,13], where [13] characterizes the best convergence result of O(1/k 1/3 ). By employing the improved convergence rate of NPG proposed in this paper, we believe that it is possible to improve the rate of the stochastic variant.…”
Section: Discussionmentioning
confidence: 99%
“…While the asymptotic convergence of actor-critic methods including natural actor-critic is well-understood by using the ODE approach [5,20], their finite-time convergence is largely unknown until recently [22,31,43,45]. The authors in [22,31] provide the rates of actor-critic where the parameter of the critic is updated by using a number of collected samples instead of only one single sample.…”
Section: Related Workmentioning
confidence: 99%
“…Such a setting, referred to as batch actor-critic, cannot be implemented in an online fashion since at any iteration the critic has to implement the current policy in a number of time steps to collect enough data. A similar batch approach was used in [45,46] to study natural actor-critic and in [36] the TRPO algorithm, which is another variant of mirror descent. A different approach was taken in [23,40] to obtain finite time bounds, where a setting of iid sampled data is considered.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations