2019 IEEE 58th Conference on Decision and Control (CDC) 2019
DOI: 10.1109/cdc40024.2019.9029198
|View full text |Cite
|
Sign up to set email alerts
|

Policy Improvement Directions for Reinforcement Learning in Reproducing Kernel Hilbert Spaces

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 7 publications
0
1
0
Order By: Relevance
“…Building on our preliminary results [21], we establish in Theorem 1 that the gradients of the value function at any state are also ascent directions of the value function at the initial state. Leveraging this result we address the convergence of the online policy gradient algorithm to a neighborhood of the critical points in Theorem 2, hence dropping the assumption of the convergence to-and existence of-the stationary distribution over states for every intermediate policy.…”
Section: Introductionmentioning
confidence: 81%
“…Building on our preliminary results [21], we establish in Theorem 1 that the gradients of the value function at any state are also ascent directions of the value function at the initial state. Leveraging this result we address the convergence of the online policy gradient algorithm to a neighborhood of the critical points in Theorem 2, hence dropping the assumption of the convergence to-and existence of-the stationary distribution over states for every intermediate policy.…”
Section: Introductionmentioning
confidence: 81%