2013
DOI: 10.1007/978-3-642-40196-1_26
|View full text |Cite
|
Sign up to set email alerts
|

The Steady-State Control Problem for Markov Decision Processes

Abstract: Abstract. This paper addresses a control problem for probabilistic models in the setting of Markov decision processes (MDP). We are interested in the steady-state control problem which asks, given an ergodic MDP M and a distribution δ goal , whether there exists a (history-dependent randomized) policy π π π ensuring that the steady-state distribution of M under π π π is exactly δ goal . We first show that stationary randomized policies suffice to achieve a given steady-state distribution. Then we infer that th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(21 citation statements)
references
References 9 publications
0
21
0
Order By: Relevance
“…In adversarial environments the problem reduces to games and for probabilistic environments to MDPs, with multiple mean-payoff objectives [16]. (B) The problem of synthesis of steady state distributions for ergodic MDPs was considered in [4]. The problem can model para.…”
Section: Experimental Results: Case Studiesmentioning
confidence: 99%
See 1 more Smart Citation
“…In adversarial environments the problem reduces to games and for probabilistic environments to MDPs, with multiple mean-payoff objectives [16]. (B) The problem of synthesis of steady state distributions for ergodic MDPs was considered in [4]. The problem can model para.…”
Section: Experimental Results: Case Studiesmentioning
confidence: 99%
“…be modeled with multiple mean-payoff objectives by considering indicator reward functions r s , for each state s, that assign reward 1 to every action enabled in s and 0 to all other actions. The steady state distribution synthesis question of [4] then reduces to the existence question for multiple mean-payoff MDPs.…”
Section: Experimental Results: Case Studiesmentioning
confidence: 99%
“…The steady-state control was introduced in [Akshay et al, 2013], treating the case of recurrent MDP and showing the problem is in PSPACE by quadratic programming. It is combined with LRA reward maximization, giving rise to steadystate policy synthesis, in [Velasquez, 2019].…”
Section: Related Workmentioning
confidence: 99%
“…In terms of the automata representation, the policy is 2-memory, remembering whether a step has been already taken, see Fig. 5 in Appendix A. Consequently, memory may be necessary, in contrast to the claim of [Velasquez, 2019] that memoryless policies are sufficient by [Akshay et al, 2013], which holds only for the setting with recurrent chains. Moreover, the combination with LTL may require even unbounded memory: Example 2.…”
Section: Problem Statement and Examplesmentioning
confidence: 99%
See 1 more Smart Citation