Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces

Sunberg, Zachary N.; Kochenderfer, Mykel J.

doi:10.1609/icaps.v28i1.13882

Cited by 82 publications

(39 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In sampling-based online algorithms for POMDPs, e.g., [133], [138], [147], belief states are represented as collections of particles at decision nodes, and a simulator that allows sampling of the next state, reward, and observation is used for constructing the tree and estimating the action-value functions. It is sufficient to be able draw samples from the state transition and observation models, as no explicit belief state tracking by Bayes filtering is necessary during planning.…”

Section: B Online Algorithmsmentioning

confidence: 99%

“…In the backpropagation phase, we use the information from the rollout to update the action-value estimates of nodes along the path from the root node to the leaf node. POMCPOW [147] also uses MCTS, with additional techniques applied for dealing with continuous action and observation spaces. In contrast, determinized sparse partially observable trees (DESPOT) [138] constrains the search to a finite set of randomly sampled scenarios, and builds a search tree that covers |A| t K nodes at depth t, where K is the number of scenarios.…”

Section: B Online Algorithmsmentioning

confidence: 99%

“…Early online algorithms [128] that build a complete search tree or use admissible heuristics to prune it with explicit computation of belief states will return an optimal solution to finite-horizon problems, similarly to exact offline algorithms. MCTS-based algorithms [133], [147] use a greedy heuristic in the selection phase to choose how to expand the search tree, and aggressively exploit the most promising branches of the search tree. They can produce search trees that are shallow in some parts and deep in other parts, potentially discovering solutions that require a very long planning horizon.…”

Section: B Online Algorithmsmentioning

confidence: 99%

See 2 more Smart Citations

Partially Observable Markov Decision Processes in Robotics: A Survey

Lauri¹,

Hsu

Pajarinen

2023

IEEE Trans. Robot.

View full text Add to dashboard Cite

Noisy sensing, imperfect control, and environment changes are defining characteristics of many real-world robot tasks. The partially observable Markov decision process (POMDP) provides a principled mathematical framework for modeling and solving robot decision and control tasks under uncertainty. Over the last decade, it has seen many successful applications, spanning localization and navigation, search and tracking, autonomous driving, multi-robot systems, manipulation, and human-robot interaction. This survey aims to bridge the gap between the development of POMDP models and algorithms at one end and application to diverse robot decision tasks at the other. It analyzes the characteristics of these tasks and connects them with the mathematical and algorithmic properties of the POMDP framework for effective modeling and solution. For practitioners, the survey provides some of the key task characteristics in deciding when and how to apply POMDPs to robot tasks successfully. For POMDP algorithm designers, the survey provides new insights into the unique challenges of applying POMDPs to robot systems and points to promising new directions for further research.

show abstract

Section: B Online Algorithmsmentioning

confidence: 99%

Section: B Online Algorithmsmentioning

confidence: 99%

Section: B Online Algorithmsmentioning

confidence: 99%

See 1 more Smart Citation

Partially Observable Markov Decision Processes in Robotics: A Survey

Lauri¹,

Hsu

Pajarinen

2023

IEEE Trans. Robot.

View full text Add to dashboard Cite

show abstract

“…We use the POMCP framework for AIPPMS because it is simpler. Our modifications can also be used with variations of POMCP, such as POMCPOW (Sunberg and Kochenderfer 2018) for continuous action spaces. A very recent online approach attempts to address the exploration-exploitation trade-off in informative planning by Pareto-optimal Monte Carlo Tree Search (Chen and Liu 2019) but does not allow for multimodal sensing.…”

Section: Online Pomdp Planningmentioning

confidence: 99%

Adaptive Informative Path Planning with Multimodal Sensing

Choudhury

Gruver

Kochenderfer

2020

ICAPS

View full text Add to dashboard Cite

Adaptive Informative Path Planning (AIPP) problems model an agent tasked with obtaining information subject to resource constraints in unknown, partially observable environments. Existing work on AIPP has focused on representing observations about the world as a result of agent movement. We formulate the more general setting where the agent may choose between different sensors at the cost of some energy, in addition to traversing the environment to gather information. We call this problem AIPPMS (MS for Multimodal Sensing). AIPPMS requires reasoning jointly about the effects of sensing and movement in terms of both energy expended and information gained. We frame AIPPMS as a Partially Observable Markov Decision Process (POMDP) and solve it with online planning. Our approach is based on the Partially Observable Monte Carlo Planning framework with modifications to ensure constraint feasibility and a heuristic rollout policy tailored for AIPPMS. We evaluate our method on two domains: a simulated search-and-rescue scenario and a challenging extension to the classic RockSample problem. We find that our approach outperforms a classic AIPP algorithm that is modified for AIPPMS, as well as online planning using a random rollout policy.

show abstract

“…Moreover, the curse of history is amplified by the large action space imposed by a 6-DOFs manipulator. Methods have been proposed to alleviate this issue (Seiler, Kurniawati, and Singh 2015), (Sunberg and Kochenderfer 2018). However, existing solvers can only perform well for problems with 3-4 continuous action spaces (Seiler, Kurniawati, and Singh 2015), while a method that can perform well for problems with 100,000 discrete actions was only recently proposed (Wang, Kurniawati, and Kroese 2018).…”

Section: Introductionmentioning

confidence: 99%

POMDP-Based Candy Server:Lessons Learned from a Seven Day Demo

Hoerger

Song

Kurniawati

et al. 2019

ICAPS

View full text Add to dashboard Cite

An autonomous robot must decide a good strategy to achieve its long term goal, despite various types of uncertainty. The Partially Observable Markov Decision Processes (POMDPs) is a principled framework to address such a decision making problem. Despite the computational intractability of solving POMDPs, the past decade has seen substantial advancement in POMDP solvers. This paper presents our experience in enabling on-line POMDP solving to become the sole motion planner for a robot manipulation demo at IEEE SIMPAR and ICRA 2018. The demo scenario is a candy-serving robot: A 6-DOFs robot arm must pick-up a cup placed on a table by a user, use the cup to scoop candies from a box, and put the cup of candies back on the table. The average perception error is ∼3cm (≈ the radius of the cup), affecting the position of the cup and the surface level of the candies. This paper presents a strategy to alleviate the curse of history issue plaguing this scenario, the perception system and its integration with the planner, and lessons learned in enabling an online POMDP solver to become the sole motion planner of this entire task. The POMDP-based system were tested through a 7 days live demo at the two conferences. In this demo, 150 runs were attempted and 98% of them were successful. We also conducted further experiments to test the capability of our POMDP-based system when the environment is relatively cluttered by obstacles and when the user moves the cup while the robot tries to pick it up. In both cases, our POMDP-based system reaches a success rate of 90% and above.

show abstract

Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces

Cited by 82 publications

References 20 publications

Partially Observable Markov Decision Processes in Robotics: A Survey

Partially Observable Markov Decision Processes in Robotics: A Survey

Adaptive Informative Path Planning with Multimodal Sensing

POMDP-Based Candy Server:Lessons Learned from a Seven Day Demo

Contact Info

Product

Resources

About