MADE: Exploration via Maximizing Deviation from Explored Regions

Zhang, Tianjun; Rashidinejad, Paria; Jiao, Jiantao; Tian, Yuandong; Russell, Stuart

doi:10.48550/arxiv.2106.10268

Cited by 1 publication

(1 citation statement)

References 62 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Effectively solving goal-conditioned RL problems requires performing good exploration. In the goal-conditioned setting, the quality of exploration depends on how goals are sampled, a problem studied in many prior methods [6,10,11,27,28,30,38,39]. These methods craft objectives that try to optimize for learning progress, and the resulting algorithms achieve good results across a range of environments.…”

Section: Related Workmentioning

confidence: 99%

C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks

Zhang¹,

Eysenbach²,

Salakhutdinov³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Goal-conditioned reinforcement learning (RL) can solve tasks in a wide range of domains, including navigation and manipulation, but learning to reach distant goals remains a central challenge to the field. Learning to reach such goals is particularly hard without any offline data, expert demonstrations, and reward shaping. In this paper, we propose an algorithm to solve the distant goal-reaching task by using search at training time to automatically generate a curriculum of intermediate states. Our algorithm, Classifier-Planning (C-Planning), frames the learning of the goal-conditioned policies as expectation maximization: the E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints. Unlike prior methods that combine goal-conditioned RL with graph search, ours performs search only during training and not testing, significantly decreasing the compute costs of deploying the learned policy. Empirically, we demonstrate that our method is more sample efficient that prior methods. Moreover, it is able to solve very long horizons manipulation and navigation tasks, tasks that prior goalconditioned methods and methods based on graph search fail to solve. 1 * Equal contribution. 1 Code and videos of our results: https://ben-eysenbach.github.io/c-planning/

show abstract

Section: Related Workmentioning

confidence: 99%