2020
DOI: 10.48550/arxiv.2006.10875
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Provably adaptive reinforcement learning in metric spaces

Abstract: We study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of the algorithm of Sinclair, Banerjee, and Yu ( 2019) and show that its regret scales with the zooming dimension of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably adaptive guarantees for … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 7 publications
(15 reference statements)
0
1
0
Order By: Relevance
“…On the other hand, due to recent successes of reinforcement learning (RL) in the control of physical systems (Yang et al, 2019;OpenAI et al, 2019;Hwangbo et al, 2019;Williams et al, 2017;Levine et al, 2016), there has been a flurry of research in online RL algorithms for continuous control. In contrast to the classical setting of adaptive nonlinear control, online RL algorithms operate in discrete-time, and often come with finite-time regret bounds (Wang et al, 2019;Cao and Krishnamurthy, 2020;Cai et al, 2020;Agarwal et al, 2020). These bounds provide a quantitative rate at which the control performance of the online algorithm approaches the performance of an oracle equipped with hindsight knowledge of the uncertainty.…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, due to recent successes of reinforcement learning (RL) in the control of physical systems (Yang et al, 2019;OpenAI et al, 2019;Hwangbo et al, 2019;Williams et al, 2017;Levine et al, 2016), there has been a flurry of research in online RL algorithms for continuous control. In contrast to the classical setting of adaptive nonlinear control, online RL algorithms operate in discrete-time, and often come with finite-time regret bounds (Wang et al, 2019;Cao and Krishnamurthy, 2020;Cai et al, 2020;Agarwal et al, 2020). These bounds provide a quantitative rate at which the control performance of the online algorithm approaches the performance of an oracle equipped with hindsight knowledge of the uncertainty.…”
Section: Introductionmentioning
confidence: 99%