Decision Theory Models for Applications in Artificial Intelligence 2012
DOI: 10.4018/978-1-60960-165-2.ch005
|View full text |Cite
|
Sign up to set email alerts
|

Inference Strategies for Solving Semi-Markov Decision Processes

Abstract: In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higher-dimensional spaces. We first introduce a new target distribution which is able to incorporate more reward information from sampled trajectories. We also show how to break strong correlations between the policy parameters and sampled trajectori… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0
1

Year Published

2012
2012
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(23 citation statements)
references
References 20 publications
0
22
0
1
Order By: Relevance
“…An implication of the reformulation of policy optimization as an inference problem is that it opens the door to a variety of inference techniques and allows continuous [7], hierarchical [18], reinforcement learning [21] and multi-agent [8] variants to be tackled with the same machinery. Nevertheless, an important problem remains: policy optimization is inherently non-convex and therefore the DBN mixture reformulation does not get rid of local optima issues.…”
Section: Planning As Inferencementioning
confidence: 99%
“…An implication of the reformulation of policy optimization as an inference problem is that it opens the door to a variety of inference techniques and allows continuous [7], hierarchical [18], reinforcement learning [21] and multi-agent [8] variants to be tackled with the same machinery. Nevertheless, an important problem remains: policy optimization is inherently non-convex and therefore the DBN mixture reformulation does not get rid of local optima issues.…”
Section: Planning As Inferencementioning
confidence: 99%
“…However, in the experiments we show that EM almost always achieves similar values as the NLP based solver to optimize FSCs (Amato et al, 2010) and much better than DEC-BPI (Bernstein et al, 2009). Key potential advantages of using EM lie in its ability to easily generalize to much richer representations than currently possible for Dec-POMDPs such as hierarchical controllers (Toussaint et al, 2008), and continuous state and action spaces (Hoffman et al, 2009b). Another important advantage is the ability to generalize the solver to larger multiagent systems with more than 2 agents by exploiting the relative independence among agents, as we will show in later sections.…”
Section: Policy Optimization Via Expectation Maximizationmentioning
confidence: 84%
“…In future work, we plan to explore several such directions. We are interested in exploring the overlap of stochastic control theory and multiagent planning in continuous action and state space models similar to the work of Hoffman et al (2009aHoffman et al ( , 2009b. We also plan to further explore ways to overcome the effect of local optima on the solution quality achieved by the EM algorithm.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…It has been recognized by Toussaint and Storkey (2006); Hoffman et al (2009b) that it is possible to view (61) as the normalization constant for an artificial trans-dimensional probability distribution, defined on…”
Section: Inference Strategies For Optimal Control Problemsmentioning
confidence: 99%