2012
DOI: 10.1287/moor.1110.0525
|View full text |Cite
|
Sign up to set email alerts
|

Splitting Randomized Stationary Policies in Total-Reward Markov Decision Processes

Abstract: This paper studies a discrete-time total-reward Markov decision process (MDP) with a given initial state distribution. A (randomized) stationary policy can be split on a given set of states if the occupancy measure of this policy can be expressed as a convex combination of the occupancy measures of stationary policies, each selecting deterministic actions on the given set and coinciding with the original stationary policy outside of this set. For a stationary policy, necessary and sufficient conditions are pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

5
67
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 36 publications
(72 citation statements)
references
References 43 publications
5
67
0
Order By: Relevance
“…It is known (c.f., Feinberg and Rothblum [9]) that an optimal policy can be found among the initial randomizations over stationary deterministic policies. This lets the multi-armed bandit problem with constraints be formulated as:…”
Section: Optimization With Constraintsmentioning
confidence: 99%
See 1 more Smart Citation
“…It is known (c.f., Feinberg and Rothblum [9]) that an optimal policy can be found among the initial randomizations over stationary deterministic policies. This lets the multi-armed bandit problem with constraints be formulated as:…”
Section: Optimization With Constraintsmentioning
confidence: 99%
“…A transient Markov decision problem (MDP) with W constraints has an optimal solution that is an initial randomization over W + 1 deterministic policies δ 1 through δ W +1 each of which differs from the next at precisely one state of the MDP; see Feinberg and Rothblum [9]. When this MDP is a multi-armed bandit, these deterministic policies need not be priority rules, however.…”
Section: Structural Propertiesmentioning
confidence: 99%
“…Our paper proposes an extension of these results since, here, we consider cost functions that may be unbounded below. However, it is important to emphasize that, in other aspects, the models studied in [9,14] are more general than ours, and cannot be studied with the techniques in the present paper. For instance, [9] deals also with undiscounted total reward MDPs, and in [14] the action sets are not supposed to be compact.…”
Section: Introductionmentioning
confidence: 96%
“…In the references [9,14], the cost functions are supposed to be bounded below and the solvability of the MDP follows from [9,Theorem 9.2] and [14,Theorem 3.2]. Our paper proposes an extension of these results since, here, we consider cost functions that may be unbounded below.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation