2022
DOI: 10.48550/arxiv.2202.03983
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Provable Reinforcement Learning with a Short-Term Memory

Abstract: Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
25
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(26 citation statements)
references
References 13 publications
1
25
0
Order By: Relevance
“…For example, block MDPs satisfy Assumption 1 with α ≥ 1/ √ O. We remark that most existing results for block MDPs or m-step decodable POMDPs (see, e.g., Krishnamurthy et al, 2016;Jiang et al, 2017;Du et al, 2019;Misra et al, 2020;Efroni et al, 2022) further involves decoder class or value function approximation, which is beyond the scope of this paper.…”
Section: Related Workmentioning
confidence: 92%
See 1 more Smart Citation
“…For example, block MDPs satisfy Assumption 1 with α ≥ 1/ √ O. We remark that most existing results for block MDPs or m-step decodable POMDPs (see, e.g., Krishnamurthy et al, 2016;Jiang et al, 2017;Du et al, 2019;Misra et al, 2020;Efroni et al, 2022) further involves decoder class or value function approximation, which is beyond the scope of this paper.…”
Section: Related Workmentioning
confidence: 92%
“…Block MDPs (Krishnamurthy et al, 2016) are POMDPs whose current latent state can be uniquely determined by the current observation. m-step decodable POMDPs (Efroni et al, 2022) are their generalizations whose latent state can be uniquely determined by the most recent history (of observations and actions) of a short length m. Block MDPs and mstep decodable POMDPs are special cases of single-step and m-step weakly revealing POMDPs, respectively. For example, block MDPs satisfy Assumption 1 with α ≥ 1/ √ O.…”
Section: Related Workmentioning
confidence: 99%
“…In particular, Azizzadenesheli et al (2016); Guo et al (2016); Jin et al (2020a); Liu et al (2022) consider the tabular POMDPs with (left) invertible emission matrices. Efroni et al (2022) considers the POMDPs where the state is fully determined by the most recent observations of a fixed length. Cayci et al (2022) analyze POMDPs where a finite internal state can approximately determine the state.…”
Section: Introductionmentioning
confidence: 99%
“…Similarly, Cayci et al (2022) consider POMDPs with a finite concentrability coefficient (Munos, 2003;Chen and Jiang, 2019), where the visitation density of an arbitrary policy is close to that of the optimal policy. In contrast, Jin et al (2020a); Efroni et al (2022); Cai et al (2022) consider POMDPs where strategic exploration is necessary.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation