2019
DOI: 10.1007/978-3-030-17462-0_27
|View full text |Cite
|
Sign up to set email alerts
|

Omega-Regular Objectives in Model-Free Reinforcement Learning

Abstract: We provide the first solution for model-free reinforcement learning of ω-regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of ω-regular objectives to an almostsure reachability problem, and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. A key feature of our technique is the compilation of ω-regular properties into limitdeterministic Büchi automata instead … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
126
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 86 publications
(126 citation statements)
references
References 27 publications
0
126
0
Order By: Relevance
“…In the context of RL, techniques based on SLDBAs are particularly useful, because these automata use the Büchi acceptance condition, which can be translated to reachability goals. Good for games and deterministic automata require more complex acceptance conditions, like parity, that do not have a natural translation into rewards [13].…”
Section: Good-for-mdp (Gfm) Automatamentioning
confidence: 99%
See 3 more Smart Citations
“…In the context of RL, techniques based on SLDBAs are particularly useful, because these automata use the Büchi acceptance condition, which can be translated to reachability goals. Good for games and deterministic automata require more complex acceptance conditions, like parity, that do not have a natural translation into rewards [13].…”
Section: Good-for-mdp (Gfm) Automatamentioning
confidence: 99%
“…SLDBAs have been used in [13] for model-free reinforcement learning of ω-regular objectives. While the Büchi acceptance condition allows for a faithful translation of the objective to a scalar reward, the agent has to learn how to control the automaton's nondeterministic choices; that is, the agent has to learn when the SLDBA should cross from the initial component to the accepting component to produce a successful run of a behavior that satisfies the given objective.…”
Section: Gfm Automata and Reinforcement Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…Specifically, [27] propose a hybrid neural network architecture combined with LDBAs to handle MDPs with continuous state spaces. The work in [26] has been taken up more recently by [28], which has focused on model-free aspects of the algorithm and has employed a different LDBA structure and reward, which introduce extra states in the product MDP. The authors also do not discuss the complexity of the automaton construction with respect to the size of the formula, but given the fact that resulting automaton is not a generalised Büchi, it can be expected that the density of automaton acceptance condition is quite low, which might result in a state-space explosion, particularly if the LTL formula is complex.…”
Section: Introductionmentioning
confidence: 99%