2017
DOI: 10.1007/978-3-319-63387-9_10
|View full text |Cite
|
Sign up to set email alerts
|

Value Iteration for Long-Run Average Reward in Markov Decision Processes

Abstract: Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this wor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
50
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
1
1

Relationship

4
3

Authors

Journals

citations
Cited by 32 publications
(50 citation statements)
references
References 31 publications
0
50
0
Order By: Relevance
“…When dealing with Markov Chains in queries, we only consider finite state sets. 2 In other words, each action is associated with exactly one state.…”
Section: Markov Decision Processesmentioning
confidence: 99%
See 3 more Smart Citations
“…When dealing with Markov Chains in queries, we only consider finite state sets. 2 In other words, each action is associated with exactly one state.…”
Section: Markov Decision Processesmentioning
confidence: 99%
“…For the MDP case, recall that simple expectation maximization of mean payoff can be reduced to weighted reachability [2] and deterministic, memoryless strategies are optimal [31]. Yet, solving a conjunctive query involving either VaR or CVaR needs more powerful strategies than in the weighted reachability case of Thm.…”
Section: Mean Payoffmentioning
confidence: 99%
See 2 more Smart Citations
“…Usually, the dynamic and stochastic VRP is modelled either as a Markov decision process or as a stochastic program [1]. An MDP consists of a finite set of states, a finite set of actions, representing the nondeterministic choices, and a transition function that given a state and an action provides the probability distribution over the successor states [2]. Differently, SP determines a feasible solution for all possible outcomes [3].…”
Section: Introductionmentioning
confidence: 99%