2018
DOI: 10.48550/arxiv.1811.11359
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Unsupervised Control Through Non-Parametric Discriminative Rewards

Abstract: Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research. We present an unsupervised learning algorithm to train agents to achieve perceptuallyspecified goals using only a stream of observations and actions. Our agent simultaneously learns a goal-conditioned policy and a goal achievement reward function that measures how similar a state is to the goal state. This dual optimization leads to a co-operative game, g… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
30
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(31 citation statements)
references
References 18 publications
1
30
0
Order By: Relevance
“…Our method learns distances between full image states (3072-dimensional) while HER uses 3-dimensional goals, a difference of two orders of magnitude in dimensionality. This difficulty of learning complex image-based goals is further corroborated in prior work Nair et al, 2018;Pong et al, 2019;Warde-Farley et al, 2018).…”
Section: Vision-based Real-world Manipulation From Human Preferencessupporting
confidence: 66%
See 1 more Smart Citation
“…Our method learns distances between full image states (3072-dimensional) while HER uses 3-dimensional goals, a difference of two orders of magnitude in dimensionality. This difficulty of learning complex image-based goals is further corroborated in prior work Nair et al, 2018;Pong et al, 2019;Warde-Farley et al, 2018).…”
Section: Vision-based Real-world Manipulation From Human Preferencessupporting
confidence: 66%
“…Our method is also well suited for fully unsupervised learning, in which case DDL uses the distance function to propose goals for unsupervised skill discovery. Prior work on unsupervised reinforcement learning has proposed choosing goals based on a variety of unsupervised criteria, typically with the aim of attaining broad state coverage (Nair et al, 2018;Florensa et al, 2018;Eysenbach et al, 2018;Warde-Farley et al, 2018;Pong et al, 2019). Our method instead repeatedly chooses the most distant state as the goal, which produces rapid exploration and quickly discovers relatively complex skills.…”
Section: Related Workmentioning
confidence: 99%
“…With the novelty and potential measures, we develop a subgoal selection strategy to improve exploration in HRL. There are numerous goal selection strategies in the multi-goal RL domain [42] as well, including sampling diverse goals uniformly from a buffer [43,44], sampling goals from the achieved goal distribution [45], sampling goals of intermediate difficulty from a generative model [46], and selecting goals in sparsely explored areas [47]. However, those methods use predefined or pretrained goal spaces.…”
Section: Related Workmentioning
confidence: 99%
“…A large body of work focusing on online skill discovery have been proposed as a means to improve exploration and sample complexity in online RL. For instance, Eysenbach et al (2018); Sharma et al (2019); Gregor et al (2016); Warde-Farley et al (2018); Liu et al (2021) propose to learn a diverse set of skills by maximizing an information theoretic objective. Online skill discovery is also commonly seen in a hierarchical framework that learns a continuous space (Vezhnevets et al, 2017;Hausman et al, 2018;Nachum et al, 2018a; or a discrete set of lower-level policies (Bacon et al, 2017;Stolle & Precup, 2002;Peng et al, 2019), upon which higher-level policies are trained to solve specific tasks.…”
Section: Related Workmentioning
confidence: 99%