1998
DOI: 10.1017/s0021900200014959
|View full text |Cite
|
Sign up to set email alerts
|

Controlled Markov set-chains with discounting

Abstract: In the framework of discounted Markov decision processes, we consider the case that the transition probability varies in some given domain at each time and its variation is unknown or unobservable. To this end we introduce a new model, named controlled Markov set-chains, based on Markov set-chains, and discuss its optimization under some partial order. Also, a numerical example is given to explain the theoretical results and the computation.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2010
2010
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(11 citation statements)
references
References 5 publications
0
11
0
Order By: Relevance
“…Such a new P is correlated according to (10). Note that it is now difficult to obtain an analytical solution of (18) for policy π 1 .…”
Section: Examplesmentioning
confidence: 99%
“…Such a new P is correlated according to (10). Note that it is now difficult to obtain an analytical solution of (18) for policy π 1 .…”
Section: Examplesmentioning
confidence: 99%
“…In this section we provide a formal description of controlled Markov set-chains, following the notation of [5] (see [5] for more detailed discussion). A controlled Markov set-chain model is a four-tuple M = (X, A, R, P = p, p ), where X is a finite set of states, A is a finite set of actions, R : X × A → R + represents a bounded nonnegative reward function, and P = p, p is an "interval transition function."…”
Section: Controlled Markov Set-chainsmentioning
confidence: 99%
“…Kurano et al [5] prove the existence of an optimal stationary policy π * and establish an optimality equation uniquely satisfied by the policy's value function V π * . They also provide some results that induce a value-iteration type algorithm [6] to compute V π * by defining relevant contraction operators (thereby obtaining π * ).…”
Section: Controlled Markov Set-chainsmentioning
confidence: 99%
See 2 more Smart Citations