2022
DOI: 10.48550/arxiv.2206.03446
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning in Observable POMDPs, without Computationally Intractable Oracles

Abstract: Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement. Specifically for learning near-optimal policies in Partially Observable Markov Decision Processes (POMDPs), existing algorithms either need to make strong assumptions about the model dynamics (e.g. deterministic transitions) or assume access to an oracle for solving a hard optimistic planning or estimation problem as a subroutine. In this work we develop the first oracle-free learning algorithm for POMD… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…As a special case of POMDPs, we may consider applying the POMDP solutions for learning a near-optimal policy in RMMDPs. There is a growing body of work that focuses on the case when single or multiple-step observations from test action sequences are sufficient statistics of the environment (e.g., [4,28,2,19,14,34,46]). In such a scenario, latent model parameters can be learned up to some parameter transformations when the system is irreducible or optimistically explored.…”
Section: Solutions For General Pomdpsmentioning
confidence: 99%
See 1 more Smart Citation
“…As a special case of POMDPs, we may consider applying the POMDP solutions for learning a near-optimal policy in RMMDPs. There is a growing body of work that focuses on the case when single or multiple-step observations from test action sequences are sufficient statistics of the environment (e.g., [4,28,2,19,14,34,46]). In such a scenario, latent model parameters can be learned up to some parameter transformations when the system is irreducible or optimistically explored.…”
Section: Solutions For General Pomdpsmentioning
confidence: 99%
“…proposed in [34,19]). The work in [30] does not give a satisfactory solution either since it requires a similar assumption of strong separability between latent contexts.…”
Section: Introductionmentioning
confidence: 99%