2019
DOI: 10.1609/aaai.v33i01.33012462
|View full text |Cite
|
Sign up to set email alerts
|

Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time

Abstract: This paper investigates how to utilize different forms of human interaction to safely train autonomous systems in realtime by learning from both human demonstrations and interventions. We implement two components of the Cycle-of-Learning for Autonomous Systems, which is our framework for combining multiple modalities of human interaction. The current effort employs human demonstrations to teach a desired behavior via imitation learning, then leverages intervention data to correct for undesired behaviors produc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
41
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 39 publications
(42 citation statements)
references
References 2 publications
1
41
0
Order By: Relevance
“…Again due to faster-learning, our proposed method sub-goal+LbB uses the least amount of data. Our results also confirm results from (Goecks et al 2019) that Learning from Intervention (LfI) is data-efficient, as it uses only the intervention data rather than all data. As the demonstrations continue, it becomes more likely to encounter seen sates and the states where the algorithm already performs well.…”
Section: Experiments and Resultssupporting
confidence: 86%
See 1 more Smart Citation
“…Again due to faster-learning, our proposed method sub-goal+LbB uses the least amount of data. Our results also confirm results from (Goecks et al 2019) that Learning from Intervention (LfI) is data-efficient, as it uses only the intervention data rather than all data. As the demonstrations continue, it becomes more likely to encounter seen sates and the states where the algorithm already performs well.…”
Section: Experiments and Resultssupporting
confidence: 86%
“…• CoL stands for Cycle-of-Learning (Goecks et al 2019), which uses only intervention data as additional demonstration data and ignores the non-intervention data.…”
Section: Discussionmentioning
confidence: 99%
“…The policy is allowed to roll-out and is trained with a combined loss from a mix of demonstration and agent data, stored in a separate first-in-first-out buffer. We validate our approach in three environments with continuous observation-and action-space: LunarLanderContinuous-v2 (Brockman et al 2016) (dense and sparse reward cases) and a custom quadrotor landing task (Goecks et al 2019) implemented using Microsoft AirSim (Shah et al 2017). The dense reward case of LunarLanderContinuous-v2 is the standard environment provided by OpenAI Gym library (Brockman et al 2016): state space consists of a eight-dimensional continuous vector with inertial states of the lander, action space consists of a two-dimensional continuous vector controlling main and side thrusts, and reward is given at every step based on the relative motion of the lander with respect to the landing pad (bonus reward is given when the landing is completed successfully).…”
Section: Methodsmentioning
confidence: 99%
“…Considering these two facts, in this work, we deployed an intervention-based DAgger algorithm so that the human pilot can always take over the control when the UAV has reached an unsafe region and provide recovery actions. Relevant work [2,14,21] have shown that the interventionbased approach can learn a policy more effectively and achieve better performance.…”
Section: Related Workmentioning
confidence: 99%