2021
DOI: 10.48550/arxiv.2102.12661
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Online Learning for Unknown Partially Observable MDPs

Mehdi Jafarnia-Jahromi,
Rahul Jain,
Ashutosh Nayyar

Abstract: Solving Partially Observable Markov Decision Processes (POMDPs) is hard. Learning optimal controllers for POMDPs when the model is unknown is harder. Online learning of optimal controllers for unknown POMDPs, which requires efficient learning using regret-minimizing algorithms that effectively tradeoff exploration and exploitation, is even harder, and no solution exists currently. In this paper, we consider infinite-horizon average-cost POMDPs with unknown transition model, though known observation model. We p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 19 publications
0
1
0
Order By: Relevance
“…Reinforcement learning in POMDPs. Our work is related to the recent line of research on developing provably efficient online RL methods for POMDPs (Guo et al, 2016;Krishnamurthy et al, 2016;Jin et al, 2020;Xiong et al, 2021;Jafarnia-Jahromi et al, 2021;Efroni et al, 2022;Liu et al, 2022). In the online setting, the actions are specified by history-dependent policies and thus the latent state does not directly affect the actions.…”
Section: Related Workmentioning
confidence: 98%
“…Reinforcement learning in POMDPs. Our work is related to the recent line of research on developing provably efficient online RL methods for POMDPs (Guo et al, 2016;Krishnamurthy et al, 2016;Jin et al, 2020;Xiong et al, 2021;Jafarnia-Jahromi et al, 2021;Efroni et al, 2022;Liu et al, 2022). In the online setting, the actions are specified by history-dependent policies and thus the latent state does not directly affect the actions.…”
Section: Related Workmentioning
confidence: 98%