The Steady-State Control Problem for Markov Decision Processes

Akshay, S.; Bertrand, Nathalie; Haddad, Serge; Hélouët, Loı̈c

doi:10.1007/978-3-642-40196-1_26

Cited by 9 publications

(21 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In adversarial environments the problem reduces to games and for probabilistic environments to MDPs, with multiple mean-payoff objectives [16]. (B) The problem of synthesis of steady state distributions for ergodic MDPs was considered in [4]. The problem can model para.…”

Section: Experimental Results: Case Studiesmentioning

confidence: 99%

See 1 more Smart Citation

MultiGain: A Controller Synthesis Tool for MDPs with Multiple Mean-Payoff Objectives

Brázdil

Chatterjee

Forejt

et al. 2015

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We present MultiGain, a tool to synthesize strategies for Markov decision processes (MDPs) with multiple mean-payoff objectives. Our models are described in PRISM, and our tool uses the existing interface and simulator of PRISM. Our tool extends PRISM by adding novel algorithms for multiple mean-payoff objectives, and also provides features such as (i) generating strategies and exploring them for simulation, and checking them with respect to other properties; and (ii) generating an approximate Pareto curve for two mean-payoff objectives. In addition, we present a new practical algorithm for the analysis of MDPs with multiple mean-payoff objectives under memoryless strategies.

show abstract

Section: Experimental Results: Case Studiesmentioning

confidence: 99%

“…be modeled with multiple mean-payoff objectives by considering indicator reward functions r s , for each state s, that assign reward 1 to every action enabled in s and 0 to all other actions. The steady state distribution synthesis question of [4] then reduces to the existence question for multiple mean-payoff MDPs.…”

Section: Experimental Results: Case Studiesmentioning

confidence: 99%

MultiGain: A Controller Synthesis Tool for MDPs with Multiple Mean-Payoff Objectives

Brázdil

Chatterjee

Forejt

et al. 2015

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…The steady-state control was introduced in [Akshay et al, 2013], treating the case of recurrent MDP and showing the problem is in PSPACE by quadratic programming. It is combined with LRA reward maximization, giving rise to steadystate policy synthesis, in [Velasquez, 2019].…”

Section: Related Workmentioning

confidence: 99%

“…In terms of the automata representation, the policy is 2-memory, remembering whether a step has been already taken, see Fig. 5 in Appendix A. Consequently, memory may be necessary, in contrast to the claim of [Velasquez, 2019] that memoryless policies are sufficient by [Akshay et al, 2013], which holds only for the setting with recurrent chains. Moreover, the combination with LTL may require even unbounded memory: Example 2.…”

Section: Problem Statement and Examplesmentioning

confidence: 99%

“…On the other hand, Steady-State Control (SSC, a.k.a. Steady-State Policy Synthesis) [Akshay et al, 2013] constrains the frequency with which states are visited, providing a more quantitative and more behavioural perspective (in terms of states of the system, as opposed to logic-based or reward-based specifications). Recently, it has started receiving more attention also in AI planning [Velasquez, 2019;Atia et al, 2020], improving the theoretical complexity and its applicability to a wider class of MDP (although still being quite restrictive on the class of policies, see below).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

LTL-Constrained Steady-State Policy Synthesis

Křetínský

2021

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Decision-making policies for agents are often synthesized with the constraint that a formal specification of behaviour is satisfied. Here we focus on infinite-horizon properties. On the one hand, Linear Temporal Logic (LTL) is a popular example of a formalism for qualitative specifications. On the other hand, Steady-State Policy Synthesis (SSPS) has recently received considerable attention as it provides a more quantitative and more behavioural perspective on specifications, in terms of the frequency with which states are visited. Finally, rewards provide a classic framework for quantitative properties. In this paper, we study Markov decision processes (MDP) with the specification combining all these three types. The derived policy maximizes the reward among all policies ensuring the LTL specification with the given probability and adhering to the steady-state constraints. To this end, we provide a unified solution reducing the multi-type specification to a multi-dimensional long-run average reward. This is enabled by Limit-Deterministic Büchi Automata (LDBA), recently studied in the context of LTL model checking on MDP, and allows for an elegant solution through a simple linear programme. The algorithm also extends to the general omega-regular properties and runs in time polynomial in the sizes of the MDP as well as the LDBA.

show abstract