Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Zhang, Matthew Shunshi; Erdogdu, Murat A.; Garg, Animesh

doi:10.1609/aaai.v36i8.20891

Cited by 2 publications

(2 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(2021) show that the policy gradient approach with a risk‐neutral expectation has global convergence guarantees. On the other hand, it may not converge to a global optimum when the objective function is a dynamic risk measure (Huang et al., 2021). It thus remains an open challenge to prove that actor–critic algorithms with dynamic convex risk measures converge to an optimal policy when both the value function and policy are characterized by ANNs.…”

Section: Discussionmentioning

confidence: 99%

“…(2021); Kose and Ruszczynski (2021); Huang et al. (2021) by focusing on the broad class of dynamic convex risk measures and consider finite‐horizon problems with nonstationary policies ; (ii) we devise an actor–critic algorithm to solve this class of RL problems using neural networks to allow continuous state–action spaces; (iii) we derive a recursive formula for efficiently computing the policy gradients; and (iv) we demonstrate the performance and flexibility of our proposed approach on three important applications: optimal trading for statistical arbitrage, hedging financial options, and obstacle avoidance in robot control. We demonstrate that our approach appropriately accounts for uncertainty and leads to strategies that mitigate risk.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement learning with dynamic convex risk measures

Coache

Jaimungal

2023

Mathematical Finance

View full text Add to dashboard Cite

We develop an approach for solving time‐consistent risk‐sensitive stochastic optimization problems using model‐free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time‐consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor–critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Reinforcement learning with dynamic convex risk measures

Coache

Jaimungal

2023

Mathematical Finance

View full text Add to dashboard Cite

show abstract

Geometry and convergence of natural policy gradient methods

Müller

Montúfar

2023

Info. Geo.

View full text Add to dashboard Cite

We study the convergence of several natural policy gradient (NPG) methods in infinite-horizon discounted Markov decision processes with regular policy parametrizations. For a variety of NPGs and reward functions we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Kakade and Morimura and co-authors by observing that these arise from the Hessian geometries of conditional entropy and entropy respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like log-barriers. Finally, we interpret the discrete-time NPG methods with regularized rewards as inexact Newton methods if the NPG is defined with respect to the Hessian geometry of the regularizer. This yields local quadratic convergence rates of these methods for step size equal to the inverse penalization strength.

show abstract

Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Cited by 2 publications

References 42 publications

Reinforcement learning with dynamic convex risk measures

Reinforcement learning with dynamic convex risk measures

Geometry and convergence of natural policy gradient methods

Contact Info

Product

Resources

About