2022
DOI: 10.48550/arxiv.2211.04786
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

Abstract: Deep Reinforcement Learning has been successfully applied to learn robotic control. However, the corresponding algorithms struggle when applied to problems where the agent is only rewarded after achieving a complex task. In this context, using demonstrations can significantly speed up the learning process, but demonstrations can be costly to acquire. In this paper, we propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration. To do so, our method lea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(7 citation statements)
references
References 26 publications
0
7
0
Order By: Relevance
“…To evaluate the drop in efficiency induced by resetting the agent to a single state, we compare SR-DCIL to DCIL-II. Indeed, by resetting the agent to demonstrated states, DCIL-II not only overcomes the limits underlined in Section IV-A, but it also learns a complex behavior by training on short rollouts only [2]. Therefore, DCIL-II should be more sample efficient than RF-DCIL at learning complex behaviors.…”
Section: B Baselinementioning
confidence: 99%
See 4 more Smart Citations
“…To evaluate the drop in efficiency induced by resetting the agent to a single state, we compare SR-DCIL to DCIL-II. Indeed, by resetting the agent to demonstrated states, DCIL-II not only overcomes the limits underlined in Section IV-A, but it also learns a complex behavior by training on short rollouts only [2]. Therefore, DCIL-II should be more sample efficient than RF-DCIL at learning complex behaviors.…”
Section: B Baselinementioning
confidence: 99%
“…As an extension of DCIL-II [2], SR-DCIL extracts a sequence of goals from a demonstration and learns to reach them sequentially to reproduce the complex demonstrated behavior. As explained in Section II-A, this strategy is adopted by various classes of RL algorithms.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations