2016 IEEE 55th Conference on Decision and Control (CDC) 2016
DOI: 10.1109/cdc.2016.7799360
|View full text |Cite
|
Sign up to set email alerts
|

Robust optimal policies for Markov decision processes with safety-threshold constraints

Abstract: Abstract-We study the synthesis of robust optimal control policies for Markov decision processes with transition uncertainty (UMDPs) and subject to two types of constraints: (i) constraints on the worst-case, maximal total cost and (ii) safetythreshold constraints that bound the worst-case probability of visiting a set of error states. For maximal total cost constraints, we propose a state-augmentation method and a twostep synthesis algorithm to generate deterministic, memoryless optimal policies given the rew… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2017
2017
2019
2019

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 12 publications
0
3
0
Order By: Relevance
“…Without considering any constraints, the first experiment is designed to observe the the relation between the weighting parameters and the approximation error and to justify the choice of weights in (4). The planning objective is to find an approximately optimal policy which drives the robot from the initial position s init : [0, 0] to the goal s goal : [8,10] while avoiding the obstacles. The reward is defined as the following: the robot receives a reward of 100 if P (s goal | s, a) > 0.5.…”
Section: A Planning Without Pctl Constraintsmentioning
confidence: 99%
See 1 more Smart Citation
“…Without considering any constraints, the first experiment is designed to observe the the relation between the weighting parameters and the approximation error and to justify the choice of weights in (4). The planning objective is to find an approximately optimal policy which drives the robot from the initial position s init : [0, 0] to the goal s goal : [8,10] while avoiding the obstacles. The reward is defined as the following: the robot receives a reward of 100 if P (s goal | s, a) > 0.5.…”
Section: A Planning Without Pctl Constraintsmentioning
confidence: 99%
“…One class of research develops policies that maximize the probability of satisfying given temporal logic specifications [1], [2], [3], [4], [5], [6], [7]. Another class of work considers multiple objectives, including soft constraints-maximizing the total reward-and hard constraints-satisfying safety properties [8], [9]. Among these work, Wang et al [3] devised the first Approximate Dynamic Programming (ADP) method to solve the problem of maximizing the probability of satisfying temporal logic constraints.…”
Section: Introductionmentioning
confidence: 99%
“…Most aforementioned work [4], [17], [19]- [22], [27] relies on the assumption that the product automaton contains at least one AEC. However, in many situations this assumption does not hold so that the probability of satisfying the task under any policy is zero.…”
Section: Introductionmentioning
confidence: 99%