2017
DOI: 10.1016/j.peva.2017.08.007
|View full text |Cite
|
Sign up to set email alerts
|

Policy learning in continuous-time Markov decision processes using Gaussian Processes

Abstract: Continuous-time Markov decision processes provide a very powerful mathematical framework to solve policy-making problems in a wide range of applications, ranging from the control of populations to cyber–physical systems. The key problem to solve for these models is to efficiently compute an optimal policy to control the system in order to maximise the probability of satisfying a set of temporal logic specifications. Here we introduce a novel method based on statistical model checking and an unbiased estimation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 47 publications
(85 reference statements)
0
8
0
Order By: Relevance
“…, using the Gillespie algorithm. This is done in the context of statistical model checking [11] and has recently been applied to learning effective time-dependent policies for CT-MDPs [12]. Here, we argue that in the case of some reward functionals a good estimate can be achieved via fluid approximation.…”
Section: Policy Synthesis Via Fluid Approximationmentioning
confidence: 92%
“…, using the Gillespie algorithm. This is done in the context of statistical model checking [11] and has recently been applied to learning effective time-dependent policies for CT-MDPs [12]. Here, we argue that in the case of some reward functionals a good estimate can be achieved via fluid approximation.…”
Section: Policy Synthesis Via Fluid Approximationmentioning
confidence: 92%
“…Obtain the probability of M1, M2, and A, and the probability of M3 can be calculated as 0.657. Continuous-time Markov decision processes provide a very powerful mathematical framework to solve widely used decision problems, as discussed by Bartocci [31]. e process can be defined as the following random process:…”
Section: A Case Of Studymentioning
confidence: 99%
“…RV of multiagent systems using trace expressions [9,88] is also a viable approach to cope with the RuLes LayeR. RV has been used for the Reactions LayeR, for example in [19]. LayeR and RuLes LayeR, and is applied for testing features at both levels in autonomous [105,227] and multiagent systems [192], we are not aware of proposals for testing the PRinciPLes LayeR of an autonomous system.…”
Section: Verification Of Autonomous Software Systemsmentioning
confidence: 99%