2018
DOI: 10.1007/978-3-319-96145-3_38
|View full text |Cite
|
Sign up to set email alerts
|

Safety-Aware Apprenticeship Learning

Abstract: Abstract. Apprenticeship learning (AL) is a kind of Learning fromDemonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
16
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 26 publications
(16 citation statements)
references
References 23 publications
0
16
0
Order By: Relevance
“…Valko et al (2013) extend the max margin formulation to the semi-supervised case where trajectories provided by an expert are assumed to be the labeled examples and trajectories provided by a non-expert are assumed to be the unlabeled examples. Zhou and Li (2018) develop a safety-aware version of max margin apprenticeship learning. A policy learned through apprenticeship learning can lead an agent into an unsafe state because an expert only supplies positive examples.…”
Section: Maximum Margin Methodsmentioning
confidence: 99%
“…Valko et al (2013) extend the max margin formulation to the semi-supervised case where trajectories provided by an expert are assumed to be the labeled examples and trajectories provided by a non-expert are assumed to be the unlabeled examples. Zhou and Li (2018) develop a safety-aware version of max margin apprenticeship learning. A policy learned through apprenticeship learning can lead an agent into an unsafe state because an expert only supplies positive examples.…”
Section: Maximum Margin Methodsmentioning
confidence: 99%
“…Using their taxonomy, shielding is an instance of "teacher provides advice" [9], where a teacher with additional information about the system guides the RL agent to pick the right actions. Apprenticeship learning [1] is a closely related variant where the teacher gives (positive) examples and has been used in the context of verification [42]. Uppaal Stratego synthesizes safe, permissive policies that are optimized via learning to create controllers for real-time systems [10].…”
Section: Related Workmentioning
confidence: 99%
“…Regarding LfD, Inverse reward design (IRD) [19] design reward functions in a manner similar to IRL. Safety-aware apprenticeship learning [44] incorporate formal specification and formal verification with IRL. However, those works confine the reward function to be linear of features as IRL does.…”
Section: Related Workmentioning
confidence: 99%