2022
DOI: 10.1609/aaai.v36i8.20891
|View full text |Cite
|
Sign up to set email alerts
|

Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Abstract: Policy gradient methods have been frequently applied to problems in control and reinforcement learning with great success, yet existing convergence analysis still relies on non-intuitive, impractical and often opaque conditions. In particular, existing rates are achieved in limited settings, under strict regularity conditions. In this work, we establish explicit convergence rates of policy gradient methods, extending the convergence regime to weakly smooth policy classes with L2 integrable gradient. We provide… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 42 publications
0
2
0
Order By: Relevance
“…(2021) show that the policy gradient approach with a risk‐neutral expectation has global convergence guarantees. On the other hand, it may not converge to a global optimum when the objective function is a dynamic risk measure (Huang et al., 2021). It thus remains an open challenge to prove that actor–critic algorithms with dynamic convex risk measures converge to an optimal policy when both the value function and policy are characterized by ANNs.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…(2021) show that the policy gradient approach with a risk‐neutral expectation has global convergence guarantees. On the other hand, it may not converge to a global optimum when the objective function is a dynamic risk measure (Huang et al., 2021). It thus remains an open challenge to prove that actor–critic algorithms with dynamic convex risk measures converge to an optimal policy when both the value function and policy are characterized by ANNs.…”
Section: Discussionmentioning
confidence: 99%
“…(2021); Kose and Ruszczynski (2021); Huang et al. (2021) by focusing on the broad class of dynamic convex risk measures and consider finite‐horizon problems with nonstationary policies ; (ii) we devise an actor–critic algorithm to solve this class of RL problems using neural networks to allow continuous state–action spaces; (iii) we derive a recursive formula for efficiently computing the policy gradients; and (iv) we demonstrate the performance and flexibility of our proposed approach on three important applications: optimal trading for statistical arbitrage, hedging financial options, and obstacle avoidance in robot control. We demonstrate that our approach appropriately accounts for uncertainty and leads to strategies that mitigate risk.…”
Section: Introductionmentioning
confidence: 99%