2020
DOI: 10.48550/arxiv.2006.09646
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework

Amber Srivastava,
Srinivasa M Salapaka

Abstract: We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision making problems modelled as infinite horizon Markov Decision Processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 28 publications
0
2
0
Order By: Relevance
“…µ a|s p a ss ca,µ ss + γV µ βΥ (s ) + c 0 (s), (7) where µ a|s = µ(a|s), p a ss = p(s |s, a), ca,µ ss = c(s, a, s ) + γ β log p(s |s, a)+ γ β log µ a|s for simplicity in notation, and c 0 (s) depends on γ and β, and is independent of the policy µ and the parameters Υ. Without loss of generality, we ignore c 0 (s) in the subsequent calculations (see [23]). For proof of the above Bellman equation please see Theorem 1 in [12] (or detailed proof in [23]).…”
Section: Mep-based Approach To Static Para-sdmmentioning
confidence: 99%
See 1 more Smart Citation
“…µ a|s p a ss ca,µ ss + γV µ βΥ (s ) + c 0 (s), (7) where µ a|s = µ(a|s), p a ss = p(s |s, a), ca,µ ss = c(s, a, s ) + γ β log p(s |s, a)+ γ β log µ a|s for simplicity in notation, and c 0 (s) depends on γ and β, and is independent of the policy µ and the parameters Υ. Without loss of generality, we ignore c 0 (s) in the subsequent calculations (see [23]). For proof of the above Bellman equation please see Theorem 1 in [12] (or detailed proof in [23]).…”
Section: Mep-based Approach To Static Para-sdmmentioning
confidence: 99%
“…Without loss of generality, we ignore c 0 (s) in the subsequent calculations (see [23]). For proof of the above Bellman equation please see Theorem 1 in [12] (or detailed proof in [23]). The optimal policy µ * β is obtained by setting…”
Section: Mep-based Approach To Static Para-sdmmentioning
confidence: 99%