2022
DOI: 10.48550/arxiv.2205.13589
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

Abstract: We study offline reinforcement learning (RL) in partially observable Markov decision processes. In particular, we aim to learn an optimal policy from a dataset collected by a behavior policy which possibly depends on the latent state. Such a dataset is confounded in the sense that the latent state simultaneously affects the action and the observation, which is prohibitive for existing offline RL algorithms. To this end, we propose the Proxy variable Pessimistic Policy Optimization (P3O) algorithm, which addres… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 35 publications
0
1
0
Order By: Relevance
“…In the meanwhile, to ensure identifiability, Namkoong et al (2020) consider the case where the unmeasured confounders affect only one of the decisions made. Very recently, there is a stream of research focused on using proximal causal inference (Tchetgen et al, 2020) for off-policy evaluation and learning in the partially observed MDP (Bennett et al, 2021;Shi et al, 2021;Lu et al, 2022).…”
Section: Data Coveragementioning
confidence: 99%
“…In the meanwhile, to ensure identifiability, Namkoong et al (2020) consider the case where the unmeasured confounders affect only one of the decisions made. Very recently, there is a stream of research focused on using proximal causal inference (Tchetgen et al, 2020) for off-policy evaluation and learning in the partially observed MDP (Bennett et al, 2021;Shi et al, 2021;Lu et al, 2022).…”
Section: Data Coveragementioning
confidence: 99%