2021
DOI: 10.48550/arxiv.2106.04096
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(14 citation statements)
references
References 13 publications
2
12
0
Order By: Relevance
“…For large state or action space, there has also been fruitful developments on extending policy gradient methods to learning the optimal policy within a parameterized function class [23,17]. The convergence of these methods, on the other hand, seems more subtle than their tabular counterparts, and often requires additional technical assumptions on the function class and the underlying MDP [1,18,25,3].…”
Section: Introductionmentioning
confidence: 99%
“…For large state or action space, there has also been fruitful developments on extending policy gradient methods to learning the optimal policy within a parameterized function class [23,17]. The convergence of these methods, on the other hand, seems more subtle than their tabular counterparts, and often requires additional technical assumptions on the function class and the underlying MDP [1,18,25,3].…”
Section: Introductionmentioning
confidence: 99%
“…The Fisher-non-degenerate setting implicitly guarantees that the agent is able to explore the state-action space under the considered policy class. Similar conditions of the Fisher-non-degeneracy is also required in other global optimum convergence framework (Assumption 6.5 in [1] on the relative condition number and Assumption 3 in [6] on the regularity of the parametric model). Assumption 13 is satisfied by a wide families of policies, including the Gaussian policy (134) and certain neural policy.…”
Section: A4 Proof Of Rl Resultsmentioning
confidence: 99%
“…We need this assumption because of the specific form of the error of Hessian estimator in recursion inequality in (6). As a result of assumption in (8), using a version of matrix moment inequality (See Lemma 3 in Appendix), we show that the dependency of the Hessian sample complexity in Theorem 1 on dimension d is in the order of log d.…”
Section: Scrn Under Gradient Dominance Property Withmentioning
confidence: 99%
“…Note that, unlike MDPs, there may be only strictly non-deterministic policies for POMDPs [20]. Furthermore, if one employs entropy regularization within the NAC framework, which is commonly employed in practice, π * automatically satisfies Condition 1 [34,35,12].…”
Section: Memory-inference Error Tradeoff For Sliding-block Controllersmentioning
confidence: 99%