Controlled Markov set-chains with discounting

Kurano, Masami; Song, Jinjie; Hosaka, Masanori; Huang, Youqiang

doi:10.1017/s0021900200014959

Cited by 3 publications

(11 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such a new P is correlated according to (10). Note that it is now difficult to obtain an analytical solution of (18) for policy π 1 .…”

Section: Examplesmentioning

confidence: 99%

Approximate Robust Policy Iteration Using Multilayer Perceptron Neural Networks for Discounted Infinite-Horizon Markov Decision Processes With Uncertain Correlated Transition Matrices

2010

IEEE Trans. Neural Netw.

View full text Add to dashboard Cite

We study finite-state, finite-action, discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices in deterministic policy spaces. Existing robust dynamic programming methods cannot be extended to solving this class of general problems. In this paper, based on a robust optimality criterion, an approximate robust policy iteration using a multilayer perceptron neural network is proposed. It is proven that the proposed algorithm converges in finite iterations, and it converges to a stationary optimal or near-optimal policy in a probability sense. In addition, we point out that sometimes even a direct enumeration may not be applicable to addressing this class of problems. However, a direct enumeration based on our proposed maximum value approximation over the parameter space is a feasible approach. We provide further analysis to show that our proposed algorithm is more efficient than such an enumeration method for various scenarios.

show abstract

“…Such a new P is correlated according to (10). Note that it is now difficult to obtain an analytical solution of (18) for policy π 1 .…”

Section: Examplesmentioning

confidence: 99%

Approximate Robust Policy Iteration Using Multilayer Perceptron Neural Networks for Discounted Infinite-Horizon Markov Decision Processes With Uncertain Correlated Transition Matrices

2010

IEEE Trans. Neural Netw.

View full text Add to dashboard Cite

show abstract

“…In this section we provide a formal description of controlled Markov set-chains, following the notation of [5] (see [5] for more detailed discussion). A controlled Markov set-chain model is a four-tuple M = (X, A, R, P = p, p ), where X is a finite set of states, A is a finite set of actions, R : X × A → R + represents a bounded nonnegative reward function, and P = p, p is an "interval transition function."…”

Section: Controlled Markov Set-chainsmentioning

confidence: 99%

“…Kurano et al [5] prove the existence of an optimal stationary policy π * and establish an optimality equation uniquely satisfied by the policy's value function V π * . They also provide some results that induce a value-iteration type algorithm [6] to compute V π * by defining relevant contraction operators (thereby obtaining π * ).…”

Section: Controlled Markov Set-chainsmentioning

confidence: 99%

“…Kurano et al [5] extend the usual MDP model to the case where the transition probability varies in some given domain at each decision time, and its variation is unobservable or unknown (see, e.g., [5] for example problems). In doing so, they develop a novel model called a "controlled Markov setchain," based on Markov set-chains [3], and study an optimal control problem with a total expected discounted reward criterion under some partial order.…”

Section: Introductionmentioning

confidence: 99%

“…By grouping the states in a given MDP, we can induce a controlled Markov set-chain model with a much smaller state space (see [2] for a related discussion). Based on appropriately defined contraction operators, Kurano et al [5] establish an optimality equation satisfied by an optimal policy (optimal in a certain partial-order sense), and some results that induce a value-iteration [6] type algorithm for solving problems modeled by controlled Markov setchains. A condition for policy improvement [6] for a single policy is provided, but no policy-iteration (PI) type algorithm based on the condition is discussed explicitly in their paper.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

On Solving Controlled Markov Set-Chains via Multi-Policy Improvement

Chang

Chong

Proceedings of the 44th IEEE Conference on Decision and Control

View full text Add to dashboard Cite

We present formal methods of improving multiple policies for solving controlled Markov set-chains with infinitehorizon discounted reward criteria. The multi-policy improvement methods follow the spirit of parallel rollout for solving Markov decision processes (MDPs). In particular, these methods are useful for on-line control of Markov set-chains and for approximately solving MDPs via state aggregation. We further discuss issues on designing a policy-iteration type algorithm based on our policy improvement methods.

show abstract

Deterministic Discounted Markov Decision Processes with Fuzzy Rewards/Costs

Cruz-Suárez,

Montes-de-Oca,

Israel Ortega-Gutiérrez

2023

Fuzzy Information and Engineering

View full text Add to dashboard Cite

The article concerns a study of infinite-horizon deterministic Markov decision processes (MDPs) for which the fuzzy environment will be presented through considering these MDPs with both fuzzy rewards and fuzzy costs. Specifically, these rewards and costs will be assumed of a suitable trapezoidal type. For both classes of MDPs, i.e., MDPs with fuzzy rewards and MDPs with fuzzy costs, the fuzzy total discounted function will be taken into account as the objective function, and the corresponding optimal decision problems will be considered with respect to the max order of the fuzzy numbers. For each optimal decision problem, the optimal policy and the optimal value function are related and obtained as a solution of a convenient standard MDP (i.e., a standard MDP is an MDP with a non-fuzzy reward function or a non-fuzzy cost function). Moreover, an economic growth model (EGM), a deterministic version of the linear-quadratic model (LQM), and an optimal consumption model (OCM) in order to clarify the theory presented are given, and it is remarked that these models have uncountable state spaces, and the corresponding non-fuzzy version of both the EGM and the OCM has an unbounded reward function, and the corresponding non-fuzzy version of the LQM has an unbounded cost function. KEYWORDSdeterministic Markov decision process; discounted criterion; fuzzy reward; fuzzy cost; trapezoidal fuzzy number T his article deals with the extension to the fuzzy context [1,2] of the class of the infinite-horizon deterministic discounted Markov decision processes (MDPs), which are sequential decision models theoretically interesting and highly applicable mainly in economics [3−13] . The deterministic MDPs which will be extended have spaces of states and decisions, Borel spaces, (possibly) noncompact restriction sets, and both cases will be considered: MDPs with rewards as well as MDPs with costs [4,5,14] , in this paper, an MDP of this class will be referred to as a standard MDP. Moreover, for the standard MDPs considered here, there are well-known conditions for the existence of optimal policies [4,14,15] , conditions that are supposed to hold and permit to take general enough MDPs into account; for instance, it is possible to consider MDPs with uncountable state spaces. This last mentioned kind is important, because the technique commonly used to solve the deterministic MDPs is the Euler's Equation [10,16,17] , which involves the derivative of the value function defined in a suitable open set.

show abstract

Controlled Markov set-chains with discounting

Cited by 3 publications

References 5 publications

Approximate Robust Policy Iteration Using Multilayer Perceptron Neural Networks for Discounted Infinite-Horizon Markov Decision Processes With Uncertain Correlated Transition Matrices

Approximate Robust Policy Iteration Using Multilayer Perceptron Neural Networks for Discounted Infinite-Horizon Markov Decision Processes With Uncertain Correlated Transition Matrices

On Solving Controlled Markov Set-Chains via Multi-Policy Improvement

Deterministic Discounted Markov Decision Processes with Fuzzy Rewards/Costs

Contact Info

Product

Resources

About