2020
DOI: 10.48550/arxiv.2007.06184
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Efficient Planning in Large MDPs with Weak Linear Function Approximation

Abstract: Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 24 publications
0
9
0
Order By: Relevance
“…However, the sample complexity in Du et al [2019c] has at least linear dependency on the number of actions, whereas our sample complexity in Section 5 has no dependency on the size of the action space. Finally, Shariff and Szepesvári [2020] obtains a polynomial upper bound under the realizability assumption when the features for all state-action pairs are inside the convex hull of a polynomial-sized coreset and the generative model is available to the agent.…”
Section: Related Workmentioning
confidence: 99%
“…However, the sample complexity in Du et al [2019c] has at least linear dependency on the number of actions, whereas our sample complexity in Section 5 has no dependency on the size of the action space. Finally, Shariff and Szepesvári [2020] obtains a polynomial upper bound under the realizability assumption when the features for all state-action pairs are inside the convex hull of a polynomial-sized coreset and the generative model is available to the agent.…”
Section: Related Workmentioning
confidence: 99%
“…In particular, Du et al (2020a); examined how the model misspecification error propagates and impacts the sample efficiency of policy learning. A related line of works assumed that the linear function class is closed or has low approximation error under the Bellman operator, referred to as low inherent Bellman error (Munos, 2005;Shariff and Szepesvári, 2020;Zanette et al, 2019Zanette et al, , 2020b.…”
Section: Additional Related Workmentioning
confidence: 99%
“…Since these works need to find a globally optimal design or barycentric spanner, their computational complexities depend polynomially on the size of the state space. Under the V * -realizability assumption (i.e., the optimal value function is linear in some feature map), Shariff and Szepesvári [2020] proposed a planning algorithm assuming the availability Table 1: Recent advances on RL algorithms with linear function approximation under different assumptions. Positive results mean query complexity depends only polynomially on the relative parameter while negative results refer an exponential lower bound on the query complexity.…”
Section: Related Workmentioning
confidence: 99%
“…†: The algorithms in these works are not query or computationally efficient unless the agent is provided with an approximate optimal design [Lattimore et al, 2020] or barycentric spanner [Du et al, 2020] or "core states" [Shariff and Szepesvári, 2020] Q * -realizability, constant gap N/A online access of a set of core states but obtaining such core states can still be computationally inefficient. Zanette et al [2019] proposed an algorithm that uses a similar concept named anchor points but only provided a greedy heuristic to generate these points.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation