2021
DOI: 10.48550/arxiv.2106.11612
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation

Abstract: We study reinforcement learning (RL) with linear function approximation. Existing algorithms for this problem only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees, which cannot guarantee the convergence to the optimal policy. In this paper, in order to overcome the limitation of existing algorithms, we propose a new algorithm called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with high probability. The uniform-PAC guarantee is the stro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 9 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?