Perturbation and stability theory for Markov control problems

Abbad, Mohammed; Filar, Jerzy A.

doi:10.1109/9.159584

Cited by 50 publications

(21 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Obviously, v πs P is continuous in P [11]. However, P can be discontinuous in θ over Θ, which may lead to discontinuous value functions v πs P in θ.…”

Section: Introductionmentioning

confidence: 94%

Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

2007

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

View full text Add to dashboard Cite

In this paper, finite-state, finite-action, discounted infinite-horizon-cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimality criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimality criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space.

show abstract

“…Obviously, v πs P is continuous in P [11]. However, P can be discontinuous in θ over Θ, which may lead to discontinuous value functions v πs P in θ.…”

Section: Introductionmentioning

confidence: 94%

Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

2007

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

View full text Add to dashboard Cite

show abstract

“…Thus, optimal solutions via classic DP are difficult to attain. It is not difficult to imagine that the estimated transition probabilities may be far from the true values due to noise and other errors associated with the estimation process, or the estimation error may be nontrivial such that it results in significant deviations from true optimal solutions [1], [8], [12]. Therefore, the ideas of set estimation for transition matrices with high confidence and robust DP were proposed to alleviate some of the deficits from both inaccurate transition matrix models and point estimation.…”

Section: Introductionmentioning

confidence: 99%

Approximate Robust Policy Iteration Using Multilayer Perceptron Neural Networks for Discounted Infinite-Horizon Markov Decision Processes With Uncertain Correlated Transition Matrices

2010

IEEE Trans. Neural Netw.

View full text Add to dashboard Cite

We study finite-state, finite-action, discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices in deterministic policy spaces. Existing robust dynamic programming methods cannot be extended to solving this class of general problems. In this paper, based on a robust optimality criterion, an approximate robust policy iteration using a multilayer perceptron neural network is proposed. It is proven that the proposed algorithm converges in finite iterations, and it converges to a stationary optimal or near-optimal policy in a probability sense. In addition, we point out that sometimes even a direct enumeration may not be applicable to addressing this class of problems. However, a direct enumeration based on our proposed maximum value approximation over the parameter space is a feasible approach. We provide further analysis to show that our proposed algorithm is more efficient than such an enumeration method for various scenarios.

show abstract

“…[12], [13], [14], [15]. In this paper we exploit the structured LP formulation proposed in [9] and ACCPM to provide an efficient algorithm for solving ergodic MDPs with strong and weak interactions.…”

Section: Markov Decision Processes (Mdps) or Their Control Counterparmentioning

confidence: 99%

Two-Time Scale Controlled Markov Chains: A Decomposition and Parallel Processing Approach

Haurie

Moresino

2007

IEEE Trans. Automat. Contr.

View full text Add to dashboard Cite

This paper deals with a class of ergodic control problems for systems described by Markov chains with strong and weak interactions. These systems are composed of a set of m subchains that are weakly coupled. Using results already available in the literature one formulates a limit control problem the solution of which can be obtained via an associated nondifferentiable convex programming (NDCP) problem. The technique used to solve the NDCP problem is the Analytic Center Cutting Plane Method (ACCPM) which implements a dialogue between, on one hand, a master program computing the analytical center of a localization set containing the solution and, on the other hand, an oracle proposing cutting planes that reduce the size of the localization set at each main iteration. The interesting aspect of this implementation comes from two characteristics: (i) the oracle proposes cutting planes by solving reduced sized Markov Decision Problems (MDP) via a linear programm (LP) or a policy iteration method; (ii) several cutting planes can be proposed simultaneously through a parallel implementation on m processors. The paper concentrates on these two aspects and shows, on a large scale MDP obtained from the numerical approximation "à la Kushner-Dupuis" of a singularly perturbed hybrid stochastic control problem, the important computational speed-up obtained.

show abstract

Perturbation and stability theory for Markov control problems

Cited by 50 publications

References 11 publications

Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

Approximate Robust Policy Iteration Using Multilayer Perceptron Neural Networks for Discounted Infinite-Horizon Markov Decision Processes With Uncertain Correlated Transition Matrices

Two-Time Scale Controlled Markov Chains: A Decomposition and Parallel Processing Approach

Contact Info

Product

Resources

About