2021
DOI: 10.48550/arxiv.2105.11066
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Abstract: Policy optimization, which learns the policy of interest by maximizing the value function via large-scale optimization techniques, lies at the heart of modern reinforcement learning (RL). In addition to value maximization, other practical considerations arise commonly as well, including the need of encouraging exploration, and that of ensuring certain structural properties of the learned policy due to safety, resource and operational constraints. These considerations can often be accounted for by resorting to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
35
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 15 publications
(35 citation statements)
references
References 34 publications
0
35
0
Order By: Relevance
“…It is also worth mentioning that by extending similar arguments, the local superlinear convergence can also be established for the approximate policy mirror descent (APMD) method developed in [13,26].…”
Section: Local Superlinear Convergence and Implicit Regularizationmentioning
confidence: 97%
See 4 more Smart Citations
“…It is also worth mentioning that by extending similar arguments, the local superlinear convergence can also be established for the approximate policy mirror descent (APMD) method developed in [13,26].…”
Section: Local Superlinear Convergence and Implicit Regularizationmentioning
confidence: 97%
“…In a nutshell, the HPMD method can be considered as a simplification of the approximate policy mirror descent (APMD) method proposed in [13] and extended in [26], by dropping the need for evaluating the perturbed stateaction value function Q π k τ k , defined by…”
Section: Homotopic Policy Mirror Descent: Linear Convergencementioning
confidence: 99%
See 3 more Smart Citations