2020
DOI: 10.48550/arxiv.2003.04069
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Abstract: Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces. In this paper, we consider episodic reinforcement learning with a continuous state-action space which is assumed to be equipped with a natural metric that characterizes the proximity between different states and actions. We propose ZOOMRL, an online algorithm that leverages ideas from continuous ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
7
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 12 publications
2
7
0
Order By: Relevance
“…• Theorem 1 gives a regret bound that depends on the packing numbers of the near-optimal set (Definition 3). This bound should be compared with the "metric-specific" regret guarantee of Sinclair et al (2019) or the "refined regret bound" of Touati et al (2020). Both of these results have the same form as ours with all terms in agreement, but with N pack r (S × A) in the place of N pack r (P Q h,r ).…”
Section: Resultssupporting
confidence: 64%
See 4 more Smart Citations
“…• Theorem 1 gives a regret bound that depends on the packing numbers of the near-optimal set (Definition 3). This bound should be compared with the "metric-specific" regret guarantee of Sinclair et al (2019) or the "refined regret bound" of Touati et al (2020). Both of these results have the same form as ours with all terms in agreement, but with N pack r (S × A) in the place of N pack r (P Q h,r ).…”
Section: Resultssupporting
confidence: 64%
“…Of these, the most related result is that of Sinclair et al (2019) who study the adaptive discretization algorithm and give a worst-case regret analysis, showing that the algorithm has a regret rate of K d+1 d+2 where d is the covering dimension of the metric space. Essentially the same results appear in Touati et al (2020), although the algorithm is slightly different. However, none of these results give sharper instance-dependence guarantees that reflect benign problem structure, as we will obtain.…”
Section: Related Worksupporting
confidence: 56%
See 3 more Smart Citations