2022
DOI: 10.48550/arxiv.2202.00063
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

Abstract: We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block structured dynamics (i.e., Block MDPs), where rich observations are generated from a set of unknown latent states. BRIEE interleaves latent states discovery, exploration, and exploitation together, and can provably learn a near-optimal policy with sample complexity scaling polynomially in the number of latent states, actions, and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 11 publications
0
1
0
Order By: Relevance
“…Uehara et al (2021) proposed provably efficient model-based algorithms for both online and offline RL with known reward. Du et al (2019); Misra et al (2020); Zhang et al (2022) studied the block MDP which is a special case of low-rank MDPs. Further, algorithms have been proposed for MDP models with low Bellman rank (Jiang et al, 2017), low witness rank (Sun et al, 2019b), bilinear classes (Du et al, 2021) and low Bellman eluder dimension (Jin et al, 2021a), which can be specialized to low-rank MDPs.…”
Section: Related Workmentioning
confidence: 99%
“…Uehara et al (2021) proposed provably efficient model-based algorithms for both online and offline RL with known reward. Du et al (2019); Misra et al (2020); Zhang et al (2022) studied the block MDP which is a special case of low-rank MDPs. Further, algorithms have been proposed for MDP models with low Bellman rank (Jiang et al, 2017), low witness rank (Sun et al, 2019b), bilinear classes (Du et al, 2021) and low Bellman eluder dimension (Jin et al, 2021a), which can be specialized to low-rank MDPs.…”
Section: Related Workmentioning
confidence: 99%