Safe Reinforcement Learning through Meta-learned Instincts

Grbic, Djordje; Risi, Sebastian

doi:10.1162/isal_a_00318

Cited by 6 publications

(4 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The instinctual network is aware of the action a P i as well as the state observation s i at step i, creating the instinct state observation s I i := s i , a P i . This is in contrast to our previous MLIN approach (Grbic and Risi, 2020), in which the instinct co-evolved to expect what kind of behavior the policy performs around hazards and therefore did not need a P i as input. In our IR 2 L approach, the instinct needs to work with a random policy on a task where hazards could be distributed differently than during pretraining; the instinct needs to know what the policy wants to execute so it can modulate it accordingly.…”

Section: Approach: Instinct Regulated Reinforcement Learningmentioning

confidence: 60%

“…In this paper, we are building on the Meta-Learned Instinctual Network (MLIN) approach (Grbic and Risi, 2020), where a policy neural network is split into two major components: a main network trained for a specific task, and a fixed pre-trained instinctual network that transfers between tasks and overrides the main policy if the agent is about to execute a dangerous action. However, meta-learning can be quite expensive since it relies on two nested learning loops: an inner task-specific loop and an outer meta-learning loop.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Safer Reinforcement Learning through Transferable Instinct Networks

Grbic

Risi

2021

The 2021 Conference on Artificial Life

Self Cite

View full text Add to dashboard Cite

Random exploration is one of the main mechanisms through which reinforcement learning (RL) finds well-performing policies. However, it can lead to undesirable or catastrophic outcomes when learning online in safety-critical environments. In fact, safe learning is one of the major obstacles towards real-world agents that can learn during deployment. One way of ensuring that agents respect hard limitations is to explicitly configure boundaries in which they can operate. While this might work in some cases, we do not always have clear a-priori information which states and actions can lead dangerously close to hazardous states. Here, we present an approach where an additional policy can override the main policy and offer a safer alternative action. In our instinctregulated RL (IR 2 L) approach, an "instinctual" network is trained to recognize undesirable situations, while guarding the learning policy against entering them. The instinct network is pre-trained on a single task where it is safe to make mistakes, and transferred to environments in which learning a new task safely is critical. We demonstrate IR 2 L in the Ope-nAI Safety gym domain, in which it receives a significantly lower number of safety violations during training than a baseline RL approach while reaching similar task performance.

show abstract

Section: Approach: Instinct Regulated Reinforcement Learningmentioning

confidence: 60%

Section: Introductionmentioning

confidence: 99%

Safer Reinforcement Learning through Transferable Instinct Networks

Grbic

Risi

2021

The 2021 Conference on Artificial Life

Self Cite

View full text Add to dashboard Cite

show abstract

“…Work in transfer learning has leveraged meta-RL [14] for safe adaptation [18,32,30]. Our work is also related to curriculum learning [5,51,33].…”

Section: Related Workmentioning

confidence: 99%

Reinforcement Learning by Guided Safe Exploration

Yang,

Simão,

Jansen

et al. 2023

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behaviour policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster.

show abstract

“…By contrast, MESA explicitly reasons about safety constraints in the environment to learn adaptable risk measures. Additionally, while prior work has also explored using meta-learning in the context of safe-RL [24], specifically by learning a single safety filter which keeps policies adapted for different tasks safe, we instead adapt the risk measure itself to unseen dynamics and fault structures.…”

Section: A42 Meta Reinforcement Learningmentioning

confidence: 99%

MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance

Luo¹,

Balakrishna²,

Thananjeyan³

et al. 2021

Preprint

View full text Add to dashboard Cite

Safe exploration is critical for using reinforcement learning (RL) in risk-sensitive environments. Recent work learns risk measures which measure the probability of violating constraints, which can then be used to enable safety. However, learning such risk measures requires significant interaction with the environment, resulting in excessive constraint violations during learning. Furthermore, these measures are not easily transferable to new environments. We cast safe exploration as an offline meta-RL problem, where the objective is to leverage examples of safe and unsafe behavior across a range of environments to quickly adapt learned risk measures to a new environment with previously unseen dynamics. We then propose MEta-learning for Safe Adaptation (MESA), an approach for meta-learning a risk measure for safe RL. Simulation experiments across 5 continuous control domains suggest that MESA can leverage offline data from a range of different environments to reduce constraint violations in unseen environments by up to a factor of 2 while maintaining task performance. See https://tinyurl.com/safe-meta-rl for code and supplementary material.

show abstract

Safe Reinforcement Learning through Meta-learned Instincts

Cited by 6 publications

References 24 publications

Safer Reinforcement Learning through Transferable Instinct Networks

Safer Reinforcement Learning through Transferable Instinct Networks

Reinforcement Learning by Guided Safe Exploration

MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance

Contact Info

Product

Resources

About