2020
DOI: 10.48550/arxiv.2012.11547
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Offline Reinforcement Learning from Images with Latent Space Models

Rafael Rafailov,
Tianhe Yu,
Aravind Rajeswaran
et al.

Abstract: Offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions. Offline RL enables extensive use and re-use of historical datasets, while also alleviating safety concerns associated with online exploration, thereby expanding the real-world applicability of RL. Most prior work in offline RL has focused on tasks with compact state representations. However, the ability to learn directly from rich observation spaces like images is critical for real-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
3

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 25 publications
0
4
0
Order By: Relevance
“…We showed the transfer capabilities of our algorithm to new tasks in a zero-shot imitation learning formulation. However, V-MAIL can in principle utilize any previously collected data for model-training, enabling potential applications in offline imitation learning in conjunction with offline RL algorithms like Rafailov et al [42].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…We showed the transfer capabilities of our algorithm to new tasks in a zero-shot imitation learning formulation. However, V-MAIL can in principle utilize any previously collected data for model-training, enabling potential applications in offline imitation learning in conjunction with offline RL algorithms like Rafailov et al [42].…”
Section: Discussionmentioning
confidence: 99%
“…Reinforcement learning from images is an inherently difficult task, since the agent needs to learn meaningful visual representations to support policy learning. A recent line of research [19,23,36,24,42] train a variational model of the image-based environment as an auxiliary task, either for representation learning only [19,36] or for additionally generating on-policy data by rolling out the model [24]. Our method builds upon these ideas, but unlike these prior works, considers the problem of learning from visual demonstrations without access to rewards.…”
Section: Related Workmentioning
confidence: 99%
“…Model-free deep RL has been effective on short-horizon skills like object grasping [21,20,22], pushing and throwing [19,23,24], and multi-task learning [23,25] from images. Alternatively, model-based approaches [26,27], which explicitly learn the environment forward dynamics from images have also been employed for multi-task vision-based robotic manipulation tasks, either with planning algorithms [28,29,30,31,32,33,34,35] or for optimizing a parametric policy [36]. Unlike these prior works, our algorithm uses a model in conjunction with task-specific success classifiers and Q-functions to learn a wide range of skills suitable for sequencing into long-horizon tasks.…”
Section: Related Workmentioning
confidence: 99%
“…They can realize their bias by either estimating uncertainty quantification to the value function [7,9,10,16], using importance sampling based algorithms [21][22][23][24][25][26][27], explicitly constraining the learning policy to be close to the dataset [8,28], learning the conservative value function [11], using KL divergence [29,9,30] or MMD [10]. On the other hand, prior model-based offline RL methods [12][13][14][31][32][33][34][35][36][37] have studied in model-uncertainty quantification [12,13,38], representation learning [14], constraining the policy to imitate the behavioral policy [34], and using conservative estimation of value function [15]. Different from these works, we propose ROMI to investigate a new direction in the model-based offline RL, which will provide natural conservatism bias with maintaining superior model-based generalization benefits.…”
Section: Related Workmentioning
confidence: 99%