Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion 2020
DOI: 10.1145/3377929.3389946
|View full text |Cite
|
Sign up to set email alerts
|

Safer reinforcement learning through evolved instincts

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 2 publications
0
4
0
Order By: Relevance
“…The instinctual network is aware of the action a P i as well as the state observation s i at step i, creating the instinct state observation s I i := s i , a P i . This is in contrast to our previous MLIN approach (Grbic and Risi, 2020), in which the instinct co-evolved to expect what kind of behavior the policy performs around hazards and therefore did not need a P i as input. In our IR 2 L approach, the instinct needs to work with a random policy on a task where hazards could be distributed differently than during pretraining; the instinct needs to know what the policy wants to execute so it can modulate it accordingly.…”
Section: Approach: Instinct Regulated Reinforcement Learningmentioning
confidence: 60%
See 1 more Smart Citation
“…The instinctual network is aware of the action a P i as well as the state observation s i at step i, creating the instinct state observation s I i := s i , a P i . This is in contrast to our previous MLIN approach (Grbic and Risi, 2020), in which the instinct co-evolved to expect what kind of behavior the policy performs around hazards and therefore did not need a P i as input. In our IR 2 L approach, the instinct needs to work with a random policy on a task where hazards could be distributed differently than during pretraining; the instinct needs to know what the policy wants to execute so it can modulate it accordingly.…”
Section: Approach: Instinct Regulated Reinforcement Learningmentioning
confidence: 60%
“…In this paper we are building on the Meta-Learned Instinctual Network (MLIN) approach (Grbic and Risi, 2020), where a policy neural network is split into two major components: a main network trained for a specific task, and a fixed pre-trained instinctual network that transfers between tasks and overrides the main policy if the agent is about to execute a dangerous action. However, meta-learning can be quite expensive since it relies on two nested learning loops: an inner task-specific loop and an outer meta-learning loop.…”
Section: Introductionmentioning
confidence: 99%
“…However, such approaches may require ad hoc tuning of the constraint violation reward and may result in unsafe decisions during the exploration phase. In the second category, the safety of the decisions is promoted by offline (batch) learning to initialize the exploration [16] or by the transfer of expert knowledge learned offline to guide the exploration [17]- [19]. Despite significant improvements, these approaches cannot provide safety guarantees and are not suitable for fully online learning.…”
Section: Introductionmentioning
confidence: 99%
“…However, such approaches may require adhoc tuning of the constraint violation reward and may result in unsafe decisions during the exploration phase. In the second category, safety of the decisions is promoted by offline (batch) learning to initialize the exploration [12], or by the transfer of expert knowledge learned offline to guide the exploration [13]- [15]. Despite significant improvements, these approaches cannot provide theoretical safety guarantees, and are not suitable for fully online learning.…”
Section: Introductionmentioning
confidence: 99%