2022
DOI: 10.48550/arxiv.2205.03353
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation

Abstract: Reinforcement learning (RL) has been shown to be effective at learning control from experience. However, RL typically requires a large amount of online interaction with the environment. This limits its applicability to real-world settings, such as in robotics, where such interaction is expensive. In this work we investigate ways to minimize online interactions in a target task, by reusing a suboptimal policy we might have access to, for example from training on related prior tasks, or in simulation. To this en… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…First, we would like to show that fine-tuning on object-specific data, similarly to what was done by Lee et al (2022), is beneficial. Therefore, we fine-tuned Gato separately on five subsets of demonstrations from the test dataset.…”
Section: Skill Generalizationmentioning
confidence: 99%
See 1 more Smart Citation
“…First, we would like to show that fine-tuning on object-specific data, similarly to what was done by Lee et al (2022), is beneficial. Therefore, we fine-tuned Gato separately on five subsets of demonstrations from the test dataset.…”
Section: Skill Generalizationmentioning
confidence: 99%
“…Each subset was obtained by random partitioning of a test dataset consisting of demonstrations gathered by a generalist sim-to-real agent stacking real test objects. We consider this setting, which is comparable to the fine-tuning baselines on RGB stacking tasks from (Lee et al, 2022); and use the 5k dataset that their behavior cloning 5k results are obtained with. To best match their experiments, we change our return filtering scheme during training: instead of using only successful stacks, we condition on the normalized return of the episode.…”
Section: Skill Generalizationmentioning
confidence: 99%
“…The main contribution of this work is a demonstration that PTR can enable offline RL pre-training on diverse real-world robotic data, and that these pre-trained policies can be finetuned to learn new tasks with just 10-15 demonstrations or with autonomously collected online interaction data in the real world. This is a significant improvement over prior RL-based pre-training and fine-tuning methods, which typically require thousands of trials [50,22,20,6,29]. We present a detailed analysis of the design decisions that enable offline RL to provide an effective pre-training framework, and show empirically that these design decisions are crucial for good performance.…”
Section: Introductionmentioning
confidence: 98%
“…The most closely related to our work are prior methods that run model-free offline RL on diverse real-world data and then fine-tune on new tasks [50,22,20,6,29]. These prior methods typically only consider the setting of online fine-tuning, whereas in our experiments, we demonstrate the efficacy of PTR for offline fine-tuning (where we must acquire a good policy for the downstream task using 10-15 demonstrations) as well as online fine-tuning considered in these prior works, where we must acquire a new task entirely via autonomous interaction in the real world.…”
Section: Introductionmentioning
confidence: 99%