Omega-Regular Objectives in Model-Free Reinforcement Learning

Hahn, Ernst Moritz; Perez, Mateo; Schewe, Sven; Somenzi, Fabio; Trivedi, Ashutosh; Wojtczak, Dominik

doi:10.1007/978-3-030-17462-0_27

Cited by 86 publications

(126 citation statements)

References 27 publications

Supporting

Mentioning

126

Contrasting

Order By: Relevance

“…In the context of RL, techniques based on SLDBAs are particularly useful, because these automata use the Büchi acceptance condition, which can be translated to reachability goals. Good for games and deterministic automata require more complex acceptance conditions, like parity, that do not have a natural translation into rewards [13].…”

Section: Good-for-mdp (Gfm) Automatamentioning

confidence: 99%

“…SLDBAs have been used in [13] for model-free reinforcement learning of ω-regular objectives. While the Büchi acceptance condition allows for a faithful translation of the objective to a scalar reward, the agent has to learn how to control the automaton's nondeterministic choices; that is, the agent has to learn when the SLDBA should cross from the initial component to the accepting component to produce a successful run of a behavior that satisfies the given objective.…”

Section: Gfm Automata and Reinforcement Learningmentioning

confidence: 99%

“…While the Büchi acceptance condition allows for a faithful translation of the objective to a scalar reward, the agent has to learn how to control the automaton's nondeterministic choices; that is, the agent has to learn when the SLDBA should cross from the initial component to the accepting component to produce a successful run of a behavior that satisfies the given objective. Any GFM automaton with a Büchi acceptance condition can be used instead of an SLDBA in the approach of [13]. While in many cases SLDBAs work well, (see, for example, results for randomly generated problems in Table 1) GFM automata that are not limit-deterministic may provide a significant advantage.…”

Section: Gfm Automata and Reinforcement Learningmentioning

confidence: 99%

“…We show that we can instead use slim automata in Section 3.2 as a first example of NBAs that are good-for-MDPs, but not limit deterministic. They have the appealing property that their branching degree is at most two, while keeping the Büchi acceptance mechanism that works well with RL [13]. (Slim automata can also be used for model checking, but they don't provide similar advantages over suitable LDBAs there, because the backwards analysis used in model checking makes selecting the correct successor trivial.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning

Hahn

Perez

Schewe

et al. 2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

We characterize the class of nondeterministic ω-automata that can be used for the analysis of finite Markov decision processes (MDPs). We call these automata 'good-for-MDPs' (GFM). We show that GFM automata are closed under classic simulation as well as under more powerful simulation relations that leverage properties of optimal control strategies for MDPs. This closure enables us to exploit state-space reduction techniques, such as those based on direct and delayed simulation, that guarantee simulation equivalence. We demonstrate the promise of GFM automata by defining a new class of automata with favorable properties-they are Büchi automata with low branching degree obtained through a simple construction-and show that going beyond limit-deterministic automata may significantly benefit reinforcement learning.

show abstract

Section: Good-for-mdp (Gfm) Automatamentioning

confidence: 99%

Section: Gfm Automata and Reinforcement Learningmentioning

confidence: 99%

Section: Gfm Automata and Reinforcement Learningmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning

Hahn

Perez

Schewe

et al. 2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…Specifically, [27] propose a hybrid neural network architecture combined with LDBAs to handle MDPs with continuous state spaces. The work in [26] has been taken up more recently by [28], which has focused on model-free aspects of the algorithm and has employed a different LDBA structure and reward, which introduce extra states in the product MDP. The authors also do not discuss the complexity of the automaton construction with respect to the size of the formula, but given the fact that resulting automaton is not a generalised Büchi, it can be expected that the density of automaton acceptance condition is quite low, which might result in a state-space explosion, particularly if the LTL formula is complex.…”

Section: Introductionmentioning

confidence: 99%

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

Hasanbeig

Kantaros

Abate

et al. 2019

2019 IEEE 58th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

Reinforcement Learning (RL) has emerged as an efficient method of choice for solving complex sequential decision making problems in automatic control, computer science, economics, and biology. In this paper we present a model-free RL algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and the agent actions, giving rise to a Probabilistically-Labeled Markov Decision Process (PL-MDP) with unknown graph structure and stochastic behaviour, which is even more general case than a fully unknown MDP. We first translate the LTL specification into a Limit Deterministic Büchi Automaton (LDBA), which is then used in an on-the-fly product with the PL-MDP. Thereafter, we define a synchronous reward function based on the acceptance condition of the LDBA. Finally, we show that the RL algorithm delivers a policy that maximizes the satisfaction probability asymptotically. We provide experimental results that showcase the efficiency of the proposed method.

show abstract

PAC Statistical Model Checking for Markov Decision Processes and Stochastic Games

Ashok

Křetínský

Weininger

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Statistical model checking (SMC) is a technique for analysis of probabilistic systems that may be (partially) unknown. We present an SMC algorithm for (unbounded) reachability yielding probably approximately correct (PAC) guarantees on the results. We consider both the setting (i) with no knowledge of the transition function (with the only quantity required a bound on the minimum transition probability) and (ii) with knowledge of the topology of the underlying graph. On the one hand, it is the first algorithm for stochastic games. On the other hand, it is the first practical algorithm even for Markov decision processes. Compared to previous approaches where PAC guarantees require running times longer than the age of universe even for systems with a handful of states, our algorithm often yields reasonably precise results within minutes, not requiring the knowledge of mixing time.

show abstract

Omega-Regular Objectives in Model-Free Reinforcement Learning

Cited by 86 publications

References 27 publications

Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning

Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

PAC Statistical Model Checking for Markov Decision Processes and Stochastic Games

Contact Info

Product

Resources

About