2021
DOI: 10.1007/978-3-030-81688-9_30
|View full text |Cite
|
Sign up to set email alerts
|

Model-Free Reinforcement Learning for Branching Markov Decision Processes

Abstract: We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…Due to this step, Mungojerrie has been connected to external linear program solvers. This enabled the extension of Mungojerrie to compute reward maximizing policies via a linear program for branching Markov decision processes in [18].…”
Section: Tool Designmentioning
confidence: 99%
See 1 more Smart Citation
“…Due to this step, Mungojerrie has been connected to external linear program solvers. This enabled the extension of Mungojerrie to compute reward maximizing policies via a linear program for branching Markov decision processes in [18].…”
Section: Tool Designmentioning
confidence: 99%
“…We also refer readers to [26,Fig. 3] which examined RL for scLTL properties, [6] for continuous-time MDPs, and [18], which extended Mungojerrie to test model-free reinforcement learning in branching Markov decision processes.…”
Section: Case Studiesmentioning
confidence: 99%
“…Being able to handle games also paves the way for using alternating automata (so long as they are good-for-MDPs) for ordinary MDPs, which has proven to allow for efficient translations from deterministic Streett to alternating Büchi automata that are good-for-MDPs, while their translation to nondeterministic Büchi automata (GFM or not) is expensive [9].…”
Section: Related Workmentioning
confidence: 99%