2021
DOI: 10.48550/arxiv.2110.12080
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks

Abstract: Goal-conditioned reinforcement learning (RL) can solve tasks in a wide range of domains, including navigation and manipulation, but learning to reach distant goals remains a central challenge to the field. Learning to reach such goals is particularly hard without any offline data, expert demonstrations, and reward shaping. In this paper, we propose an algorithm to solve the distant goal-reaching task by using search at training time to automatically generate a curriculum of intermediate states. Our algorithm, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 20 publications
0
1
0
Order By: Relevance
“…For instance, in Hierarchical RL, a highlevel policy may construct a sequence of goals which must be successively achieved by a low-level goal-conditioned policy [4,5,6,7,8,9]. Other approaches combine goal-conditioned value learning and goal-level planning to achieve distant goals [10,11,12,13,14]. However, in order to exploit an accurate value function, it should be noted that these planning methods are based on at least one of the two following assumptions: a dense and informative reward function is available, or an assumption of resetting in hard-to-attain states is required.…”
Section: A Sequential Goal Reachingmentioning
confidence: 99%
“…For instance, in Hierarchical RL, a highlevel policy may construct a sequence of goals which must be successively achieved by a low-level goal-conditioned policy [4,5,6,7,8,9]. Other approaches combine goal-conditioned value learning and goal-level planning to achieve distant goals [10,11,12,13,14]. However, in order to exploit an accurate value function, it should be noted that these planning methods are based on at least one of the two following assumptions: a dense and informative reward function is available, or an assumption of resetting in hard-to-attain states is required.…”
Section: A Sequential Goal Reachingmentioning
confidence: 99%