2021
DOI: 10.48550/arxiv.2104.07794
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation

Abstract: Reinforcement learning (RL) algorithms based on high-dimensional function approximation have achieved tremendous empirical success in large-scale problems with an enormous number of states. However, most analysis of such algorithms gives rise to error bounds that involve either the number of states or the number of features. This paper considers the situation where the function approximation is made either using the kernel method or the two-layer neural network model, in the context of a fitted Q-iteration alg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
28
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(28 citation statements)
references
References 18 publications
0
28
0
Order By: Relevance
“…4. We give a concrete form of the perturbation response by distribution mismatch (Lemma 2) and show that when the assumptions on concentration coefficients in the existing literature [18,10,17,28] are satisfied or the eigenvalue decay of the kernel is fast, ∆ M (ǫ) decays fast with respect to ǫ (Proposition 5 and Lemma 3).…”
Section: Our Contributionmentioning
confidence: 99%
See 4 more Smart Citations
“…4. We give a concrete form of the perturbation response by distribution mismatch (Lemma 2) and show that when the assumptions on concentration coefficients in the existing literature [18,10,17,28] are satisfied or the eigenvalue decay of the kernel is fast, ∆ M (ǫ) decays fast with respect to ǫ (Proposition 5 and Lemma 3).…”
Section: Our Contributionmentioning
confidence: 99%
“…Based on the type of used assumptions, these works can be divided into two categories. The first category of upper bound [13,43,44] depends on the eigenvalue decay of kernel while the second category [19,28] requires accessibility to reference distributions that can uniformly bound all possible state-action distributions under admissible policies (called assumption on concentration coefficients). In this work, we show that the perturbational complexity ∆ M (ǫ) decays fast in both situations and establish an upper bound for the fitted reward algorithm (see Algorithm 2 in Section 5.1) and the fitted Q-iteration algorithm (see Algorithm 3 in Section 5.2) under the assumption that ∆ M (ǫ) decays fast.…”
Section: Our Contributionmentioning
confidence: 99%
See 3 more Smart Citations