2021
DOI: 10.48550/arxiv.2111.01587
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Procedural Generalization by Planning with Self-Supervised World Models

Abstract: One of the key promises of model-based reinforcement learning is the ability to generalize using an internal model of the world to make predictions in novel environments and tasks. However, the generalization ability of model-based agents is not well understood because existing work has focused on model-free agents when benchmarking generalization. Here, we explicitly measure the generalization ability of model-based agents in comparison to their model-free counterparts. We focus our analysis on MuZero [60], a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(11 citation statements)
references
References 23 publications
0
11
0
Order By: Relevance
“…World models which can adapt to changing dynamics will be more challenging, but Seo et al [148] give an initial example. Anand et al [156] is the first example investigating how well standard MBRL approaches generalise, and we look forward to seeing more work in this area.…”
Section: Future Work On Methods For Generalisationmentioning
confidence: 97%
See 3 more Smart Citations
“…World models which can adapt to changing dynamics will be more challenging, but Seo et al [148] give an initial example. Anand et al [156] is the first example investigating how well standard MBRL approaches generalise, and we look forward to seeing more work in this area.…”
Section: Future Work On Methods For Generalisationmentioning
confidence: 97%
“…A second under-explored area is model-based reinforcement learning (MBRL) for generalisation. Most methods surveyed here are model-free, with notable exceptions being [150,126,148,156]. Learning a world model and combining it with planning methods can enable stronger forms of generalisation, especially to novel reward functions (if the reward function is available during planning).…”
Section: Future Work On Methods For Generalisationmentioning
confidence: 99%
See 2 more Smart Citations
“…Prior work has explored improving generalization by training on a large amount of levels [13], adding dataaugmentation to visual inputs [21,22], knowledge transfer during training [23], and self-supervised world models [24]. Beyond zero-shot generalization, few prior works use Procgen to evaluate meta-RL, with one exception being Alver et al [25], which showed that RL 2 failed to generalize on simplified Procgen games.…”
Section: Procgen Experimentsmentioning
confidence: 99%