2018
DOI: 10.1137/16m1100204
|View full text |Cite
|
Sign up to set email alerts
|

Ordinary Differential Equation Methods for Markov Decision Processes and Application to Kullback--Leibler Control Cost

Abstract: A new approach to computation of optimal policies for MDP (Markov decision process) models is introduced. The main idea is to solve not one, but an entire family of MDPs, parameterized by a scalar ζ that appears in the one-step reward function. For an MDP with d states, the family of value functions {h * ζ : ζ ∈ R} is the solution to an ODE,where the vector field V : R d → R d has a simple form, based on a matrix inverse. This general methodology is applied to a family of average-cost optimal control models in… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 12 publications
(16 citation statements)
references
References 31 publications
0
16
0
Order By: Relevance
“…For more history the reader is referred to [36,14], in addition to the papers surveyed in Section 3.2. While beyond the scope of this article, it is important to note that Todorov's 'linearly solvable' MDP model [44] is similar to prior work such as [24], and the form of the solution could have been anticipated from well-known results in the theory of large-deviations for Markov chains [7]. It is pointed out in [45] that this approach has a long history in the context of controlled stochastic differential equations [18].…”
Section: Grid Actuationmentioning
confidence: 99%
See 3 more Smart Citations
“…For more history the reader is referred to [36,14], in addition to the papers surveyed in Section 3.2. While beyond the scope of this article, it is important to note that Todorov's 'linearly solvable' MDP model [44] is similar to prior work such as [24], and the form of the solution could have been anticipated from well-known results in the theory of large-deviations for Markov chains [7]. It is pointed out in [45] that this approach has a long history in the context of controlled stochastic differential equations [18].…”
Section: Grid Actuationmentioning
confidence: 99%
“…Consider a load model in which the full state space is the Cartesian product X = X u × X n , where X u are components of the state that can be directly manipulated through control. In prior work [7,6], the following conditional-independence structure is assumed: for each state x = (x u , x n ), and each ζ ∈ R,…”
Section: Uncontrolled Dynamicsmentioning
confidence: 99%
See 2 more Smart Citations
“…The main contribution of this paper is to demonstrate that the local control designs introduced in [3], [4], [21] admit practical extension to continuous-state models. This is true even in the significantly more complex setting in which the nominal dynamics include stochastic disturbances that are outside of direct control (such as variations in ambient temperature, or inlet temperature of water to the TCL).…”
Section: Introductionmentioning
confidence: 99%