2022
DOI: 10.21203/rs.3.rs-2406802/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Safe Behaviour via Justified Human Preferences and Hypothetical Queries

Abstract: Although reinforcement learning is a powerful paradigm for agent sequential decision-making, it cannot be used in its traditional form in most safety-critical environments. Human feedback can enable an agent to learn a good policy while avoiding unsafe states, but at the cost of human time. We present JPAL-HA, a model for safe learning in safety-critical environments that is grounded on two novel ideas: (i) human preferences over a choice of actions are augmented with justifications such as one action is prefe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 21 publications
0
4
0
Order By: Relevance
“…There are some prior works that investigate humancentered SRRL. For instance, Kazantzidis et al (2022) introduced a mechanism to ensure safety during exploration by harnessing human preferences. Reddy et al (2020) present a method to learn the model of human objectives by leveraging human feedback based on hypothetical behaviors, and then the model can be used to ensure the safety of robot learning.…”
Section: Related Work On Safe Robot-reinforcement Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…There are some prior works that investigate humancentered SRRL. For instance, Kazantzidis et al (2022) introduced a mechanism to ensure safety during exploration by harnessing human preferences. Reddy et al (2020) present a method to learn the model of human objectives by leveraging human feedback based on hypothetical behaviors, and then the model can be used to ensure the safety of robot learning.…”
Section: Related Work On Safe Robot-reinforcement Learningmentioning
confidence: 99%
“…The related problem of how to align has been discussed in earlier literature (Christiano et al, 2018;Leike et al, 2018;Kazantzidis et al, 2022;Liu R. et al, 2022) on how to align agents with user intentions, in which meaningful training signals can be hard to obtain, due to the unpredictable long-term effect of the behaviors, or potential influence to other agents and environments in large multi-agent systems.…”
Section: Safety Value Alignmentmentioning
confidence: 99%
See 1 more Smart Citation
“…• Human-in-the-loop interactive learning for robust and trusted decision making, including studies on (Kazantzidis et al 2022;Gu et al 2022;Liu et al 2022).…”
mentioning
confidence: 99%