2022
DOI: 10.48550/arxiv.2201.08102
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Safe Deep RL in 3D Environments using Human Feedback

Abstract: Agents should avoid unsafe behaviour during both training and deployment. This typically requires a simulator and a procedural specification of unsafe behaviour. Unfortunately, a simulator is not always available, and procedurally specifying constraints can be difficult or impossible for many real-world tasks. A recently introduced technique, ReQueST, aims to solve this problem by learning a neural simulator of the environment from safe human trajectories, then using the learned simulator to efficiently learn … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 9 publications
0
4
0
Order By: Relevance
“…In other settings, we could assume that the agent (e.g. a robot) uses an attached camera to obtain the observations [14]. Finally, here we do not consider any debatable (normative) occasions, i.e.…”
Section: The Safe Exploration Problemmentioning
confidence: 99%
See 3 more Smart Citations
“…In other settings, we could assume that the agent (e.g. a robot) uses an attached camera to obtain the observations [14]. Finally, here we do not consider any debatable (normative) occasions, i.e.…”
Section: The Safe Exploration Problemmentioning
confidence: 99%
“…We compare JPAL-HA as closely as possible with Human Intervention RL (HIRL) [24]. We did not compare with ReQueST [14,25], another human-inthe-loop method with a focus on safety, because the necessary approximations would make a comparison unrealistic; for example, ReQueST uses off-policy data and there is a question of how these data can be collected. HIRL [24] learns in three stages: (i) the agent learns a policy using standard reinforcement learning methods with a human overseeing it, blocking catastrophic actions and proposing an alternative (normally the best when they understand the environment) action; (ii) the agent pauses and a binary catastrophe classifier is built from the record of interventions from Stage (i) using supervised learning; and (iii) the agent starts again, but now the human is substituted by the catastrophe classifier (a kind of governor).…”
Section: Queries or Human Intervention?mentioning
confidence: 99%
See 2 more Smart Citations