2021
DOI: 10.48550/arxiv.2106.03853
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DisTop: Discovering a Topological representation to learn diverse and rewarding skills

Abstract: The optimal way for a deep reinforcement learning (DRL) agent to explore is to learn a set of skills that achieves a uniform distribution of states. Following this, we introduce DisTop, a new model that simultaneously learns diverse skills and focuses on improving rewarding skills. DisTop progressively builds a discrete topology of the environment using an unsupervised contrastive loss, a growing network and a goal-conditioned policy. Using this topology, a state-independent hierarchical policy can select wher… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(14 citation statements)
references
References 18 publications
0
14
0
Order By: Relevance
“…Skew-fit manages to explore image-based environments very efficiently. As highlighted in [Aubret et al 2021], this ratio applied on a discrete number of skills, amount to rewards a Boltzmann goal-selection policy with:…”
Section: Proposing Diverse State-goalsmentioning
confidence: 99%
See 4 more Smart Citations
“…Skew-fit manages to explore image-based environments very efficiently. As highlighted in [Aubret et al 2021], this ratio applied on a discrete number of skills, amount to rewards a Boltzmann goal-selection policy with:…”
Section: Proposing Diverse State-goalsmentioning
confidence: 99%
“…It improves exploration in a two-dimensional latent embedding but the size of partitions may not scale well if the agent considers more latent dimensions. In contrast, DisTop [Aubret et al 2021] dynamically clusters a dynamic-aware embedding space using a variant of a Growing When Required [Marsland et al 2002]; they estimate the density of state according to how much its partition contains states and skew the distribution of sampled similarly to Skew-fit. HESS and DisTop demonstrate their ability to explore and navigate with an ant inside complex mazes without extrinsic rewards.…”
Section: Proposing Diverse State-goalsmentioning
confidence: 99%
See 3 more Smart Citations