2021
DOI: 10.48550/arxiv.2103.06671
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks

Abstract: This paper studies the statistical theory of offline reinforcement learning with deep ReLU networks. We consider the off-policy evaluation (OPE) problem where the goal is to estimate the expected discounted reward of a target policy given the logged data generated by unknown behaviour policies. We study a regression-based fitted Q evaluation (FQE) method using deep ReLU networks and characterize a finite-sample bound on the estimation error of this method under mild assumptions. The prior works in OPE with eit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 19 publications
0
3
0
Order By: Relevance
“…Our ongoing work Li et al (2022) suggests that a new variant of pessimistic model-based algorithm is sample-optimal for a broader range of ε, which in turn motivates further investigation into whether model-free algorithms can accommodate a broader ε-range too without compromising sample efficiency. Moving beyond the tabular setting, it would be of great importance to extend the algorithmic and theoretical framework to accommodate low-complexity function approximation (Nguyen-Tang et al, 2021).…”
Section: Discussionmentioning
confidence: 99%
“…Our ongoing work Li et al (2022) suggests that a new variant of pessimistic model-based algorithm is sample-optimal for a broader range of ε, which in turn motivates further investigation into whether model-free algorithms can accommodate a broader ε-range too without compromising sample efficiency. Moving beyond the tabular setting, it would be of great importance to extend the algorithmic and theoretical framework to accommodate low-complexity function approximation (Nguyen-Tang et al, 2021).…”
Section: Discussionmentioning
confidence: 99%
“…A more recent line of work has studied variants of fitted Q-iteration (FQI) using neural network approximation, and provided statistical guarantees under different notions of smoothness. For example, Fan et al [10] exploited the Hölder smoothness of the range of Bellman operator to derive bounds on estimation error; Nguyen-Tang et al [28] approximated deep ReLU networks using Besov classes; and Long et al [22] analyzed two-layer neural networks based on neural tangent kernels or Barron spaces. All these works contribute to the understanding of empirical success of deep reinforcement learning.…”
Section: Related Workmentioning
confidence: 99%
“…Parallel to its practical significance, recently there is a surge of theoretical investigations towards offline RL via two threads: offline policy evaluation (OPE), where the goal is to estimate the value of a target (fixed) policy V π (Li et al, 2015;Jiang & Li, 2016;Wang et al, 2017;Liu et al, 2018;Kallus & Uehara, 2020Uehara & Jiang, 2019;Feng et al, 2019;Nachum et al, 2019;Xie et al, 2019;Yin & Wang, 2020;Kato et al, 2020;Duan et al, 2020;Feng et al, 2020;Zhang et al, 2020b;Kuzborskij et al, 2020;Wang et al, 2020b;Zhang et al, 2021;Uehara et al, 2021;Nguyen-Tang et al, 2021;Hao et al, 2021;Xiao et al, 2021) and offline (policy) learning which intends to output a nearoptimal policy (Antos et al, 2008a,b;Chen & Jiang, 2019;Le et al, 2019;Xie & Jiang, 2020a,b;Liu et al, 2020b;Hao et al, 2020;Zanette, 2020;Jin et al, 2020c;Hu et al, 2021;Yin et al, 2021a;Ren et al, 2021;Rashidinejad et al, 2021). Yin et al (2021b) initiates the studies for offline RL from the new perspective of uniform convergence in OPE (uniform OPE for short) which unifies OPE and offline learning tasks.…”
Section: Introductionmentioning
confidence: 99%