Inference Strategies for Solving Semi-Markov Decision Processes

Freitas, Nando de

doi:10.4018/978-1-60960-165-2.ch005

Cited by 18 publications

(23 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…An implication of the reformulation of policy optimization as an inference problem is that it opens the door to a variety of inference techniques and allows continuous [7], hierarchical [18], reinforcement learning [21] and multi-agent [8] variants to be tackled with the same machinery. Nevertheless, an important problem remains: policy optimization is inherently non-convex and therefore the DBN mixture reformulation does not get rid of local optima issues.…”

Section: Planning As Inferencementioning

confidence: 99%

Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains

Poupart

Lang

Toussaint

2011

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. Planning as inference recently emerged as a versatile approach to decision-theoretic planning and reinforcement learning for single and multi-agent systems in fully and partially observable domains with discrete and continuous variables. Since planning as inference essentially tackles a non-convex optimization problem when the states are partially observable, there is a need to develop techniques that can robustly escape local optima. We investigate the local optima of finite state controllers in single agent partially observable Markov decision processes (POMDPs) that are optimized by expectation maximization (EM). We show that EM converges to controllers that are optimal with respect to a one-step lookahead. To escape local optima, we propose two algorithms: the first one adds nodes to the controller to ensure optimality with respect to a multi-step lookahead, while the second one splits nodes in a greedy fashion to improve reward likelihood. The approaches are demonstrated empirically on benchmark problems.

show abstract

Section: Planning As Inferencementioning

confidence: 99%

Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains

Poupart

Lang

Toussaint

2011

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

show abstract

“…However, in the experiments we show that EM almost always achieves similar values as the NLP based solver to optimize FSCs (Amato et al, 2010) and much better than DEC-BPI (Bernstein et al, 2009). Key potential advantages of using EM lie in its ability to easily generalize to much richer representations than currently possible for Dec-POMDPs such as hierarchical controllers (Toussaint et al, 2008), and continuous state and action spaces (Hoffman et al, 2009b). Another important advantage is the ability to generalize the solver to larger multiagent systems with more than 2 agents by exploiting the relative independence among agents, as we will show in later sections.…”

Section: Policy Optimization Via Expectation Maximizationmentioning

confidence: 84%

“…In future work, we plan to explore several such directions. We are interested in exploring the overlap of stochastic control theory and multiagent planning in continuous action and state space models similar to the work of Hoffman et al (2009aHoffman et al ( , 2009b. We also plan to further explore ways to overcome the effect of local optima on the solution quality achieved by the EM algorithm.…”

Section: Resultsmentioning

confidence: 99%

“…Such approaches have been successful in solving MDPs and POMDPs and they easily extend to factored or hierarchical structures (Toussaint, Charlin, & Poupart, 2008). Furthermore, they can handle continuous action and state spaces thanks to advanced probabilistic inference techniques (Hoffman, Kueck, de Freitas, & Doucet, 2009b). We show how Dec-POMDPs, which are much harder to solve than MDPs or POMDPs, can also be reformulated as a mixture of DBNs.…”

Section: Summary Of Contributionsmentioning

confidence: 99%

See 1 more Smart Citation

Probabilistic Inference Techniques for Scalable Multiagent Decision Making

Kumar

Zilberstein

Toussaint

2015

jair

View full text Add to dashboard Cite

Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models-NEXP-Complete even for two agents-has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques.

show abstract

“…It has been recognized by Toussaint and Storkey (2006); Hoffman et al (2009b) that it is possible to view (61) as the normalization constant for an artificial trans-dimensional probability distribution, defined on…”

Section: Inference Strategies For Optimal Control Problemsmentioning

confidence: 99%

Particle filters and Markov chains for learning of dynamical systems

Lindsten

2013

View full text Add to dashboard Cite

Sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) methods provide computational tools for systematic inference and learning in complex dynamical systems, such as nonlinear and non-Gaussian state-space models. This thesis builds upon several methodological advances within these classes of Monte Carlo methods.Particular emphasis is placed on the combination of SMC and MCMC in so called particle MCMC algorithms. These algorithms rely on SMC for generating samples from the often highly autocorrelated state-trajectory. A specific particle MCMC algorithm, referred to as particle Gibbs with ancestor sampling (PGAS), is suggested. By making use of backward sampling ideas, albeit implemented in a forward-only fashion, PGAS enjoys good mixing even when using seemingly few particles in the underlying SMC sampler. This results in a computationally competitive particle MCMC algorithm. As illustrated in this thesis, PGAS is a useful tool for both Bayesian and frequentistic parameter inference as well as for state smoothing. The PGAS sampler is successfully applied to the classical problem of Wiener system identification, and it is also used for inference in the challenging class of non-Markovian latent variable models.Many nonlinear models encountered in practice contain some tractable substructure. As a second problem considered in this thesis, we develop Monte Carlo methods capable of exploiting such substructures to obtain more accurate estimators than what is provided otherwise. For the filtering problem, this can be done by using the well known RaoBlackwellized particle filter (RBPF). The RBPF is analysed in terms of asymptotic variance, resulting in an expression for the performance gain offered by Rao-Blackwellization. Furthermore, a Rao-Blackwellized particle smoother is derived, capable of addressing the smoothing problem in so called mixed linear/nonlinear state-space models. The idea of Rao-Blackwellization is also used to develop an online algorithm for Bayesian parameter inference in nonlinear state-space models with affine parameter dependencies.v Populärvetenskaplig sammanfattningMatematiska modeller av dynamiska förlopp används inom i stort sett alla tekniska och naturvetenskapliga discipliner. Till exempel, inom epidemiologi används modeller för att prediktera, dvs. förutsäga, spridningen av influensavirus inom en population. Antag att vi gör regelbundna observationer av hur många personer i populationen som är smittade. Baserat på denna information kan en modell användas för att prediktera antalet nya sjukdomsfall under, låt säga, nästkommande veckor. Den här typen av information möjliggör att en epidemi kan identifieras i ett tidigt skede, varpå åtgärder kan tas för att minska dess påverkan. Ett annat exempel är att prediktera hur hastigheten och orienteringen på ett flygplan påverkas då en viss styrsignal ställs ut på rodren, vilket är viktigt vid styrsystemdesign. Sådana prediktioner kräver en modell av flygplanets dynamik. Ytterligare ett exempel är att prediktera utvecklingen på e...

show abstract

Inference Strategies for Solving Semi-Markov Decision Processes

Cited by 18 publications

References 20 publications

Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains

Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains

Probabilistic Inference Techniques for Scalable Multiagent Decision Making

Particle filters and Markov chains for learning of dynamical systems

Contact Info

Product

Resources

About