2019
DOI: 10.48550/arxiv.1907.01657
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dynamics-Aware Unsupervised Discovery of Skills

Abstract: Conventionally, model-based reinforcement learning (MBRL) aims to learn a global model for the dynamics of the environment. A good model can potentially enable planning algorithms to generate a large variety of behaviors and solve diverse tasks. However, learning an accurate model for complex dynamical systems is difficult, and even then, the model might not generalize well outside the distribution of states on which it was trained. In this work, we combine model-based learning with model-free learning of prim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
84
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
4

Relationship

2
8

Authors

Journals

citations
Cited by 47 publications
(84 citation statements)
references
References 40 publications
0
84
0
Order By: Relevance
“…There has been a variety of past exploration work in which agents are trained to seek out a uniform distribution of states or to learn distinct skills which take the agent to different parts of the state space such as (Sharma et al, 2019) and (Eysenbach et al, 2018). While any of these off-the-shelf exploration algorithms could be used, in our work, we experiment in the Minigrid environment, where the state space and dynamics are simple enough it is possible to hard-code an oracle exploration policy which achieves a uniform coverage over reachable states.…”
Section: Exploration Objectivementioning
confidence: 99%
“…There has been a variety of past exploration work in which agents are trained to seek out a uniform distribution of states or to learn distinct skills which take the agent to different parts of the state space such as (Sharma et al, 2019) and (Eysenbach et al, 2018). While any of these off-the-shelf exploration algorithms could be used, in our work, we experiment in the Minigrid environment, where the state space and dynamics are simple enough it is possible to hard-code an oracle exploration policy which achieves a uniform coverage over reachable states.…”
Section: Exploration Objectivementioning
confidence: 99%
“…Although the task of precise model-based prediction is in general challenging [24], in this work, we only adopt model-based prediction in action selection, and the target is action discovery other than precise prediction. As the dynamic models are always static across learning, such an approach can be much more stable than TD-IC-INVASE.…”
Section: Static Approximation: Model-based Action Selectionmentioning
confidence: 99%
“…A large body of work focusing on online skill discovery have been proposed as a means to improve exploration and sample complexity in online RL. For instance, Eysenbach et al (2018); Sharma et al (2019); Gregor et al (2016); Warde-Farley et al (2018); Liu et al (2021) propose to learn a diverse set of skills by maximizing an information theoretic objective. Online skill discovery is also commonly seen in a hierarchical framework that learns a continuous space (Vezhnevets et al, 2017;Hausman et al, 2018;Nachum et al, 2018a; or a discrete set of lower-level policies (Bacon et al, 2017;Stolle & Precup, 2002;Peng et al, 2019), upon which higher-level policies are trained to solve specific tasks.…”
Section: Related Workmentioning
confidence: 99%