2018
DOI: 10.2139/ssrn.3195310
|View full text |Cite
|
Sign up to set email alerts
|

Robust Partially Observable Markov Decision Processes

Abstract: In a variety of applications, decisions needs to be made dynamically after receiving imperfect observations about the state of an underlying system. Partially Observable Markov Decision Processes (POMDPs) are widely used in such applications. To use a POMDP, however, a decision-maker must have access to reliable estimations of core state and observation transition probabilities under each possible state and action pair. This is often challenging mainly due to lack of ample data, especially when some actions ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 29 publications
0
11
0
Order By: Relevance
“…This lemma unveils the connection between POMDP and SA-MDP: SA-MDP can be seen as a version of "robust" POMDP where the policy needs to be robust under a set of observational processes (adversaries). SA-MDP is different from robust POMDP (RPOMDP) (Osogami, 2015;Rasouli & Saghafian, 2018), which optimizes for the worst case environment transitions.…”
Section: Finding the Optimal Policy Under A Fixed Adversarymentioning
confidence: 99%
“…This lemma unveils the connection between POMDP and SA-MDP: SA-MDP can be seen as a version of "robust" POMDP where the policy needs to be robust under a set of observational processes (adversaries). SA-MDP is different from robust POMDP (RPOMDP) (Osogami, 2015;Rasouli & Saghafian, 2018), which optimizes for the worst case environment transitions.…”
Section: Finding the Optimal Policy Under A Fixed Adversarymentioning
confidence: 99%
“…The ability to solve such planning and multiagent control problems under realistic real-world uncertainty highlights the fact that undecidability of optimal policies need not prevent useful (although presumably not optimal) plans and policies from being devised-at least as long as feasible solutions can be generated and improved fairly easily to obtain good (or even, with enough iterations, approximately optimal) solutions. This is the case for POMDPs with discounted reward criteria, as well as for certain extensions of POMDPs to robust optimization settings, where model parameters are known only to lie within a specified uncertainty set, and worst-case expected cumulative reward is to be maximized (Osogami, 2015;Rasouli & Saghafian, 2018). Although computational complexity remains a formidable challenge for large POMDPs, state-of-the-art solvers use a combination of ideas (including random sampling of search trees, discretization of value functions and updates ("point-based" value iteration), together with dynamic programming and linear programming techniques) to provide solutions whose quality improves with available computational budget (Shani, Pineau, & Kaplow, 2013;Smith & Simmons, 2005).…”
Section: Multiagent Team Controlmentioning
confidence: 99%
“…Meanwhile, Rasouli and Saghafian (2018) consider a general setting of robust POMDP, where the DM may not be able to obtain the exact information of the nature's choice of decision. In this case, the sufficient statistic is no longer a single belief state, but a collection of belief states, and the expected reward up to the current time must be taken into account to realize a policy that is robust in terms of the entire cumulative expected reward.…”
Section: Robust and Distributionally Robust Mdpmentioning
confidence: 99%
“…In this case, the sufficient statistic is no longer a single belief state, but a collection of belief states, and the expected reward up to the current time must be taken into account to realize a policy that is robust in terms of the entire cumulative expected reward. Rasouli and Saghafian (2018) derives an exact algorithm for the case where the uncertainty set is discrete. Here we note that robust POMDP for continuous support of uncertainty sets is computationally challenging even in a very simple setting (Nakao et al, 2019).…”
Section: Robust and Distributionally Robust Mdpmentioning
confidence: 99%