Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence 2018
DOI: 10.24963/ijcai.2018/662
|View full text |Cite
|
Sign up to set email alerts
|

Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs

Abstract: Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discountedsum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic sea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
20
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(21 citation statements)
references
References 7 publications
1
20
0
Order By: Relevance
“…State beliefs are studied when verifying HMMs [59], where the question whether a sequence of observations likely occurs, or which HMM is an adequate representation of a system [37]. State beliefs are prominent in the verification of partially observable MDPs [16,32,40], where one can observe the actions taken (but the problem itself is to find the right scheduler). Our monitoring problem can be phrased as a special case of verification of partially observable stochastic games [20], but automatic techniques for those very general models are lacking.…”
Section: Related Workmentioning
confidence: 99%
“…State beliefs are studied when verifying HMMs [59], where the question whether a sequence of observations likely occurs, or which HMM is an adequate representation of a system [37]. State beliefs are prominent in the verification of partially observable MDPs [16,32,40], where one can observe the actions taken (but the problem itself is to find the right scheduler). Our monitoring problem can be phrased as a special case of verification of partially observable stochastic games [20], but automatic techniques for those very general models are lacking.…”
Section: Related Workmentioning
confidence: 99%
“…We consider an equivalent reformulation of the POMDP as an (infinite) belief MDP: Here, each state is a distribution over POMDP states. Such a belief MDP has additional properties that have been exploited to allow verification [80,98,102]. Storm uses a combination of abstraction-and-refinement techniques to iteratively generate a finite abstract belief MDP that soundly approximates the extremal reachability probabilities in the POMDP [19].…”
Section: Partially Observable Markov Decision Processesmentioning
confidence: 99%
“…Quantitative variants of reach-avoid specifications have gained attention in, e.g., [11,28,40]. Other approaches restrict themselves to simple policies [3,33,45,58].…”
Section: Contributions Our Paper Makes Four Contributions: (1)mentioning
confidence: 99%
“…The underlying approaches depend on discounted reward maximization and can satisfy almost-sure specifications with high reliability. However, enforcing probabilities that are close to 0 or 1 requires a discount factor close to 1, drastically reducing the scalability of such approaches [28]. Moreover, probabilities in the underlying POMDP need to be precisely given, which is not always realistic [14].…”
Section: Contributions Our Paper Makes Four Contributions: (1)mentioning
confidence: 99%