2021
DOI: 10.1109/lra.2021.3068655
|View full text |Cite
|
Sign up to set email alerts
|

Batch Exploration With Examples for Scalable Robotic Reinforcement Learning

Abstract: Learning from diverse offline datasets is a promising path towards learning general purpose robotic agents. However, a core challenge in this paradigm lies in collecting large amounts of meaningful data, while not depending on a human in the loop for data collection. One way to address this challenge is through task-agnostic exploration, where an agent attempts to explore without a task-specific reward function, and collect data that can be useful for any downstream task. While these approaches have shown some… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 16 publications
(19 citation statements)
references
References 33 publications
0
19
0
Order By: Relevance
“…For adversarial learning techniques, we use gradient penalty (GP) to avoid over-fitting the discriminator. Other options for discriminator regularization techniques include spectral normalization [36], Mixup [37,38], and PUGAIL [39], however we chose GP as it has been empirically shown to achieve decent performance across multiple tasks [40,41].…”
Section: Methodsmentioning
confidence: 99%
“…For adversarial learning techniques, we use gradient penalty (GP) to avoid over-fitting the discriminator. Other options for discriminator regularization techniques include spectral normalization [36], Mixup [37,38], and PUGAIL [39], however we chose GP as it has been empirically shown to achieve decent performance across multiple tasks [40,41].…”
Section: Methodsmentioning
confidence: 99%
“…Much like our work, a number of prior works have studied how learning from broad datasets can enhance generalization in robot learning [16,33,56,13,22,24,10,5]. These works have largely studied the problem of collecting large and diverse robotic datasets in scalable ways [28,22,10,53,7] as well as techniques for learning general purpose policies from this style of data in an offline [13,5] or online [33,29,24] fashion. While our motivation of achieving generalization by learning from diverse data heavily overlaps with the above works, our approach fundamentally differs in that it aims to sidestep the challenges associated with collecting diverse robotic data by instead leveraging existing human data sources.…”
Section: Robotic Learning From Large Datasetsmentioning
confidence: 99%
“…Alternatively (Lee et al, 2020;Wayne et al, 2018) train a variational latent space model, but use it only as a filter and train a separate policy on top of the learned latent representation. Model-based RL learns a dynamics model either in the pixel space (Finn and Levine, 2017;Ebert et al, 2018) or in a latent space Watter et al, 2015;Banijamali et al, 2018;Hafner et al, 2019Ha and Schmidhuber, 2018;Kipf et al, 2019;Chen et al, 2020) and can either learn a policy within the model or deploy shooting-based planning methods. However, most of those prior works rely critically on online data collection to be successful.…”
Section: Related Workmentioning
confidence: 99%
“…However, most of those prior works rely critically on online data collection to be successful. Visual foresight algorithms (Finn and Levine, 2017;Ebert et al, 2018;Suh and Tedrake, 2020;Yen-Chen et al, 2019;Chen et al, 2020) handle control from pixels in a fully offline setting, but do not explicitly tackle the distributional shift issue that arises; meanwhile, our method is designed to specifically address this. As a result, we find in Section 5.2 that our approach significantly outperforms visual foresight.…”
Section: Related Workmentioning
confidence: 99%