2020
DOI: 10.48550/arxiv.2010.01374
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 0 publications
0
8
0
Order By: Relevance
“…In the remainder of this paper, we will refer to them as the Nash Bellman operator and µ-Bellman operator, respectively. It is known that RL with function approximation is in general statistically intractable without further assumptions (see, e.g., hardness results in [29,55]). Below, we present two assumptions that are generalizations of commonly adopted assumptions in MDP literature.…”
Section: Function Approximationmentioning
confidence: 99%
“…In the remainder of this paper, we will refer to them as the Nash Bellman operator and µ-Bellman operator, respectively. It is known that RL with function approximation is in general statistically intractable without further assumptions (see, e.g., hardness results in [29,55]). Below, we present two assumptions that are generalizations of commonly adopted assumptions in MDP literature.…”
Section: Function Approximationmentioning
confidence: 99%
“…Thus, this result immediately sheds light on the challenges involved in achieving minimax optimal regret for general RL with linear function approximation. Reinforcement Learning with Linear Function Approximation Recent years have witnessed a flurry of activity on RL with linear function approximation (e.g., Jiang et al, 2017;Yang and Wang, 2019a,b;Jin et al, 2020;Wang et al, 2019;Modi et al, 2020;Dann et al, 2018;Du et al, 2019;Sun et al, 2019;Zanette et al, 2020a,b;Cai et al, 2019;Jia et al, 2020;Ayoub et al, 2020;Weisz et al, 2020;Zhou et al, 2020;He et al, 2020a). These results can be generally grouped into four categories based on their assumptions on the underlying MDP.…”
Section: Related Workmentioning
confidence: 99%
“…Exploration has been widely studied in the tabular setting (Azar et al, 2017;Zanette and Brunskill, 2019;Efroni et al, 2019;Jin et al, 2018;Dann et al, 2019;Zhang et al, 2020;Russo, 2019), but obtaining formal guarantees for exploration with function approximation is a challenge even in the linear case due to recent lower bounds (Du et al, 2019;Weisz et al, 2020;Zanette, 2020;Wang et al, 2020a). When the action-value function is only approximately linear, several ideas from tabular exploration and linear bandits (Lattimore and Szepesvári, 2020) have been combined to obtain provably efficient algorithms in low-rank MDPs (Yang and Wang, 2020;Zanette et al, 2020a;Jin et al, 2020) and their extensions (Wang et al, 2019(Wang et al, , 2020b.…”
Section: B Additional Related Literaturementioning
confidence: 99%
“…These conditions are in some sense necessary, especially for high dimensional problems; otherwise, the learner in the worst case would require exponentially many samples before discovering any useful information (see e.g. (Kakade et al, 2003;Krishnamurthy et al, 2016;Weisz et al, 2020)). However, these provably efficient RL algorithms are typically not robust to model misspecification, because their performance guarantees allow for only small ℓ ∞ -bounded perturbations from their assumptions.…”
Section: Introductionmentioning
confidence: 99%