2021
DOI: 10.48550/arxiv.2103.12690
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap

Abstract: A fundamental question in the theory of reinforcement learning is: suppose the optimal Q-function lies in the linear span of a given d dimensional feature mapping, is sample-efficient reinforcement learning (RL) possible? The recent and remarkable result of resolved this question in the negative, providing an exponential (in d) sample size lower bound, which holds even if the agent has access to a generative model of the environment. One may hope that this information theoretic barrier for RL can be circumven… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
11
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 17 publications
0
11
0
Order By: Relevance
“…A subject of much recent interest in the RL community is that of RL with function approximation. The majority of attention has been devoted to MDPs with linear structure Jin et al, 2020b;Wang et al, 2019;Du et al, 2019;Zanette et al, 2020a,b;Ayoub et al, 2020;Jia et al, 2020;Weisz et al, 2021;Zhou et al, 2020Zhou et al, , 2021Zhang et al, 2021b;Wang et al, 2021;Wagenmaker et al, 2021a). Several different settings of MDPs with linear structure have been proposed.…”
Section: Related Workmentioning
confidence: 99%
“…A subject of much recent interest in the RL community is that of RL with function approximation. The majority of attention has been devoted to MDPs with linear structure Jin et al, 2020b;Wang et al, 2019;Du et al, 2019;Zanette et al, 2020a,b;Ayoub et al, 2020;Jia et al, 2020;Weisz et al, 2021;Zhou et al, 2020Zhou et al, , 2021Zhang et al, 2021b;Wang et al, 2021;Wagenmaker et al, 2021a). Several different settings of MDPs with linear structure have been proposed.…”
Section: Related Workmentioning
confidence: 99%
“…Several recent works have extended their results significantly (Du et al, 2021;Jin et al, 2021). In the special case of linear function approximation, a vast body of recent work exists Jin et al, 2020b;Wang et al, 2019;Du et al, 2019;Zanette et al, 2020a,b;Ayoub et al, 2020;Jia et al, 2020;Weisz et al, 2021;Zhou et al, 2020Zhou et al, , 2021Zhang et al, 2021;Wang et al, 2021). A variety of assumptions are made in these works, and we highlight two of them in particular.…”
Section: Related Workmentioning
confidence: 94%
“…Concentrability places stronger restrictions on the data distribution and underlying dynamics, and always implies identifiability when the state and action space are finite. Further work in this direction includes (1) Zanette (2021), who provides a slightly more general lower bound for linear realizability, and (2) lower bounds for online reinforcement learning with linear realizability (Du et al, 2020;Weisz et al, 2021;Wang et al, 2021b). It is worth noting that Zanette (2021) provides a lower bound with a policy-induced data distribution, where over-coverage cannot occur; however concentrability is not satisfied by his construction.…”
Section: Related Workmentioning
confidence: 99%