Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas

Kuo, Yen-Ling; Katz, Boris; Barbu, Andrei

doi:10.1109/iros45743.2020.9341325

Cited by 14 publications

(13 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Camacho et al [8] show that one can generate RMs from temporal specifications but RMs generated this way lead to sparse rewards. Kuo et al [23] propose a compositional model for zero-shot execution of LTL formulas but training such a model requires a lot of samples even for relatively simpler environments. There has also been recent work on using temporal logic specifications for multi-agent RL [13,29].…”

Section: Related Workmentioning

confidence: 99%

Compositional Reinforcement Learning from Logical Specifications

Jothimurugan¹,

Bansal²,

Bastani³

et al. 2021

Preprint

View full text Add to dashboard Cite

We study the problem of learning control policies for complex tasks given by logical specifications. Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy that maximizes the expected reward. These approaches, however, scale poorly to complex tasks that require high-level planning. In this work, we develop a compositional learning approach, called DIRL, that interleaves highlevel planning and reinforcement learning. First, DIRL encodes the specification as an abstract graph; intuitively, vertices and edges of the graph correspond to regions of the state space and simpler sub-tasks, respectively. Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph. An evaluation of the proposed approach on a set of challenging control benchmarks with continuous state and action spaces demonstrates that it outperforms state-of-the-art baselines.Preprint. Under review.

show abstract

Section: Related Workmentioning

confidence: 99%

Compositional Reinforcement Learning from Logical Specifications

Jothimurugan¹,

Bansal²,

Bastani³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Formally, the first component of the SM is the extractor E, which transforms the complex formula T into a list K consisting of all the sequences of atomic tasks α that satisfy T . As it is common in the literature (Kuo et al, 2020;Vaezipoor et al, 2021), we assume that the SM have access to an internal labelling function L I : Z → 2 AP . L I is the restriction of L to the observation space and maps the observations of the agent in Z into the set AP of atoms.…”

Section: Neuro-symbolic Agentsmentioning

confidence: 99%

“…Beyond TL, we find methods such as reward machines (a type of finite state machine) Xu et al, 2020;, or RL-specific formal languages such as SPECTRL (Jothimurugan et al, 2019). Closer to our line of work, Kuo et al (2020) presents a novel RL framework to follow OOD combinations of known tasks expressed in linear-time temporal logic (LTL) by training multiple networks (one per LTL operator and per object). In a similar line, Araki et al (2021) introduces a hierarchical reinforcement learning framework aimed to learn policies that are optimal and composable while relying on different neural networks each specialized in one subtask.…”

Section: Related Workmentioning

confidence: 99%

“…Formal languages (Huth & Ryan, 2004) offer desirable properties such as unambiguous semantics and compositional syntax, allowing to automatically generate large amounts of training instructions and their corresponding reward functions. Earlier contributions in this area rely on the compositional nature of formal languages, often employing multiple policy networks to execute temporal logic (TL) instructions, e.g., one network per sub-task within the given instruction (Andreas et al, 2017; or one per symbol and object (Kuo et al, 2020). However, these approaches do not scale well with the number of instructions since policy networks are a computationally costly resource and, consequently, these earlier studies are restricted to relatively small state-spaces environments that require less computation, e.g., non-visual settings.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications

León¹,

Shanahan²,

Belardinelli³

2021

Preprint

View full text Add to dashboard Cite

We address the problem of building agents whose goal is to satisfy out-of distribution (OOD) multi-task instructions expressed in temporal logic (TL) by using deep reinforcement learning (DRL). Recent works provided evidence that the deep learning architecture is a key feature when teaching a DRL agent to solve OOD tasks in TL. Yet, the studies on their performance are still limited. In this work, we analyse various state-of-the-art (SOTA) architectures that include generalisation mechanisms such as relational layers, the soft-attention mechanism, or hierarchical configurations when generalising safety-aware tasks expressed in TL. Most importantly, we present a novel deep learning architecture that induces agents to generate latent representations of their current goal given both the human instruction and the current observation from the environment. We find that applying our proposed configuration to SOTA architectures yields significantly stronger performance when executing new tasks in OOD environments.

show abstract

“…This manuscript is an extension of “Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas” ( Kuo et al, 2020b ) by the same authors published at the Conference on Intelligent Robots and Systems (IROS) 2020. The manuscript contains over 30% new material, including additional technical details and a new domain, Fetch, which required substantive advances.…”

mentioning

confidence: 99%

Compositional RL Agents That Follow Language Commands in Temporal Logic

Kuo¹,

Katz²,

Barbu³

2021

Front. Robot. AI

Self Cite

View full text Add to dashboard Cite

We demonstrate how a reinforcement learning agent can use compositional recurrent neural networks to learn to carry out commands specified in linear temporal logic (LTL). Our approach takes as input an LTL formula, structures a deep network according to the parse of the formula, and determines satisfying actions. This compositional structure of the network enables zero-shot generalization to significantly more complex unseen formulas. We demonstrate this ability in multiple problem domains with both discrete and continuous state-action spaces. In a symbolic domain, the agent finds a sequence of letters that satisfy a specification. In a Minecraft-like environment, the agent finds a sequence of actions that conform to a formula. In the Fetch environment, the robot finds a sequence of arm configurations that move blocks on a table to fulfill the commands. While most prior work can learn to execute one formula reliably, we develop a novel form of multi-task learning for RL agents that allows them to learn from a diverse set of tasks and generalize to a new set of diverse tasks without any additional training. The compositional structures presented here are not specific to LTL, thus opening the path to RL agents that perform zero-shot generalization in other compositional domains.

show abstract

Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas

Cited by 14 publications

References 13 publications

Compositional Reinforcement Learning from Logical Specifications

Compositional Reinforcement Learning from Logical Specifications

In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications

Compositional RL Agents That Follow Language Commands in Temporal Logic

Contact Info

Product

Resources

About