Bayesian Reinforcement Learning in Factored POMDPs

Katt, Sammie; Oliehoek, Frans A.; Amato, Christopher

doi:10.48550/arxiv.1811.05612

Search citation statements

Order By: Relevance

Paper Sections

Select...

Related Literature1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2021

Publication Types

Select...

Other1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With our natural definition of regret, their algorithm suffers linear regret. Other learning algorithms for POMDPs either consider linear dynamics (Lale et al, 2020b;Tsiamis & Pappas, 2020) or do not consider regret (Shani et al, 2005;Ross et al, 2007;Poupart & Vlassis, 2008;Cai et al, 2009;Liu et al, 2011;Doshi-Velez et al, 2013;Katt et al, 2018;Azizzade-nesheli et al, 2018) and are not directly comparable to our setting.…”

Section: Related Literaturementioning

confidence: 88%

Online Learning for Unknown Partially Observable MDPs

Jafarnia-Jahromi,

Jain,

Nayyar

2021

Preprint

View full text Add to dashboard Cite

Solving Partially Observable Markov Decision Processes (POMDPs) is hard. Learning optimal controllers for POMDPs when the model is unknown is harder. Online learning of optimal controllers for unknown POMDPs, which requires efficient learning using regret-minimizing algorithms that effectively tradeoff exploration and exploitation, is even harder, and no solution exists currently. In this paper, we consider infinite-horizon average-cost POMDPs with unknown transition model, though known observation model. We propose a natural posterior sampling-based reinforcement learning algorithm (POMDP-PSRL) and show that it achieves O(T 2/3 ) regret where T is the time horizon. To the best of our knowledge, this is the first online RL algorithm for POMDPs and has sub-linear regret.

show abstract