2018
DOI: 10.1609/icaps.v28i1.13882
|View full text |Cite
|
Sign up to set email alerts
|

Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces

Abstract: Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is subopti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
39
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 82 publications
(39 citation statements)
references
References 20 publications
0
39
0
Order By: Relevance
“…In sampling-based online algorithms for POMDPs, e.g., [133], [138], [147], belief states are represented as collections of particles at decision nodes, and a simulator that allows sampling of the next state, reward, and observation is used for constructing the tree and estimating the action-value functions. It is sufficient to be able draw samples from the state transition and observation models, as no explicit belief state tracking by Bayes filtering is necessary during planning.…”
Section: B Online Algorithmsmentioning
confidence: 99%
See 2 more Smart Citations
“…In sampling-based online algorithms for POMDPs, e.g., [133], [138], [147], belief states are represented as collections of particles at decision nodes, and a simulator that allows sampling of the next state, reward, and observation is used for constructing the tree and estimating the action-value functions. It is sufficient to be able draw samples from the state transition and observation models, as no explicit belief state tracking by Bayes filtering is necessary during planning.…”
Section: B Online Algorithmsmentioning
confidence: 99%
“…In the backpropagation phase, we use the information from the rollout to update the action-value estimates of nodes along the path from the root node to the leaf node. POMCPOW [147] also uses MCTS, with additional techniques applied for dealing with continuous action and observation spaces. In contrast, determinized sparse partially observable trees (DESPOT) [138] constrains the search to a finite set of randomly sampled scenarios, and builds a search tree that covers |A| t K nodes at depth t, where K is the number of scenarios.…”
Section: B Online Algorithmsmentioning
confidence: 99%
See 1 more Smart Citation
“…We use the POMCP framework for AIPPMS because it is simpler. Our modifications can also be used with variations of POMCP, such as POMCPOW (Sunberg and Kochenderfer 2018) for continuous action spaces. A very recent online approach attempts to address the exploration-exploitation trade-off in informative planning by Pareto-optimal Monte Carlo Tree Search (Chen and Liu 2019) but does not allow for multimodal sensing.…”
Section: Online Pomdp Planningmentioning
confidence: 99%
“…Moreover, the curse of history is amplified by the large action space imposed by a 6-DOFs manipulator. Methods have been proposed to alleviate this issue (Seiler, Kurniawati, and Singh 2015), (Sunberg and Kochenderfer 2018). However, existing solvers can only perform well for problems with 3-4 continuous action spaces (Seiler, Kurniawati, and Singh 2015), while a method that can perform well for problems with 100,000 discrete actions was only recently proposed (Wang, Kurniawati, and Kroese 2018).…”
Section: Introductionmentioning
confidence: 99%