2019
DOI: 10.48550/arxiv.1910.04281
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Dense and Sparse Reward Environments

Vinicius G. Goecks,
Gregory M. Gremillion,
Vernon J. Lawhern
et al.

Abstract: This paper investigates how to efficiently transition and update policies, trained initially with demonstrations, using off-policy actor-critic reinforcement learning. It is well-known that techniques based on Learning from Demonstrations, for example behavior cloning, can lead to proficient policies given limited data. However, it is currently unclear how to efficiently update that policy using reinforcement learning as these approaches are inherently optimizing different objective functions. Previous works h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 5 publications
0
3
0
Order By: Relevance
“…Additionally, agents can learn from natural language-defined goals [185]. Finally, agents can learn from combining human data with reinforcement learning [137,59,139].…”
Section: Robotic Navigationmentioning
confidence: 99%
“…Additionally, agents can learn from natural language-defined goals [185]. Finally, agents can learn from combining human data with reinforcement learning [137,59,139].…”
Section: Robotic Navigationmentioning
confidence: 99%
“…First, SAC Behavioral Cloning (SACBC) uses Behavioral Cloning [20] as a regularization mechanism for the update of the actor. This is inspired by various other RLfD algorithms [21,22,23]. These methods all propose two main components.…”
Section: B Guiding Rl Using Demonstrationsmentioning
confidence: 99%
“…They introduced a Behavior Cloning (BC) loss to the DRL optimization function and showed that the agent is more efficient than a baseline DDPG method. Similarly, Goecks et al [12] proposed a two-phase combination of BC and DRL, where demonstrations were used to pretrain the network followed by training a DRL agent to produce an adaptable behavior. In this paper, we experiment with the latter for training expert subtasks.…”
Section: Related Workmentioning
confidence: 99%