2020
DOI: 10.1609/aaai.v34i06.6602
|View full text |Cite
|
Sign up to set email alerts
|

Learning from Interventions Using Hierarchical Policies for Safe Learning

Abstract: Learning from Demonstrations (LfD) via Behavior Cloning (BC) works well on multiple complex tasks. However, a limitation of the typical LfD approach is that it requires expert demonstrations for all scenarios, including those in which the algorithm is already well-trained. The recently proposed Learning from Interventions (LfI) overcomes this limitation by using an expert overseer. The expert overseer only intervenes when it suspects that an unsafe action is about to be taken. Although LfI significantly improv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…Deep reinforcement learning (Deep RL) algorithms use either discrete or continuous state-action spaces. Although continuous state-action spaces are suitable for complex and dynamic environments [1,2], they do involve some downsides [3,4], such as a long training time [5][6][7] a high amount of complexity [8,9], and sample inefficiency [10]. One of the proposed solutions to overcome these issues is to combine deep RL algorithms with learning from demonstration (LfD) [11][12][13][14][15].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Deep reinforcement learning (Deep RL) algorithms use either discrete or continuous state-action spaces. Although continuous state-action spaces are suitable for complex and dynamic environments [1,2], they do involve some downsides [3,4], such as a long training time [5][6][7] a high amount of complexity [8,9], and sample inefficiency [10]. One of the proposed solutions to overcome these issues is to combine deep RL algorithms with learning from demonstration (LfD) [11][12][13][14][15].…”
Section: Introductionmentioning
confidence: 99%
“…In LfD, human experts should put lots of effort before the training time of the RL algorithm and cover all possible steps in the environment that are impossible in a continuous space [18][19][20]. Therefore, it needs a large number of data samples, and sometimes the excessive reliance on optimal actions may cause policy divergence [5,21,22].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation