C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks

Zhang, Tianjun; Eysenbach, Benjamin; Salakhutdinov, Ruslan; Levine, Sergey

doi:10.48550/arxiv.2110.12080

Cited by 1 publication

(1 citation statement)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, in Hierarchical RL, a highlevel policy may construct a sequence of goals which must be successively achieved by a low-level goal-conditioned policy [4,5,6,7,8,9]. Other approaches combine goal-conditioned value learning and goal-level planning to achieve distant goals [10,11,12,13,14]. However, in order to exploit an accurate value function, it should be noted that these planning methods are based on at least one of the two following assumptions: a dense and informative reward function is available, or an assumption of resetting in hard-to-attain states is required.…”

Section: A Sequential Goal Reachingmentioning

confidence: 99%

Divide & Conquer Imitation Learning

Chenu

Perrin-Gilbert

Sigaud

2022

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

Demonstrations are commonly used to speed up the learning process of deep reinforcement learning algorithms. To cope with the difficulty of accessing multiple demonstrations, some algorithms have been developed to learn from a single demonstration. In particular, the Divide & Conquer Imitation Learning algorithms leverage a sequential bias to learn a control policy for complex robotic tasks using a single state-based demonstration. The latest version, DCIL-II demonstrates remarkable sample efficiency. This novel method operates within an extended Goal-Conditioned Reinforcement Learning framework, ensuring compatibility between intermediate and subsequent goals extracted from the demonstration. However, a fundamental limitation arises from the assumption that the system can be reset to specific states along the demonstrated trajectory, confining the application to simulated systems. In response, we introduce an extension called Single-Reset DCIL (SR-DCIL), designed to overcome this constraint by relying on a single initial state reset rather than sequential resets. To address this more challenging setting, we integrate two mechanisms inspired by the Learning from Demonstrations literature, including a Demo-Buffer and Value Cloning to guide the agent toward compatible success states. In addition, we introduce Approximate Goal Switching to facilitate training to reach goals distant from the reset state. Our paper makes several contributions, highlighting the importance of the reset assumption in DCIL-II, presenting the mechanisms of SR-DCIL variants and evaluating their performance in challenging robotic tasks compared to DCIL-II. In summary, this work offers insights into the significance of reset assumptions in the framework of DCIL and proposes SR-DCIL, a first step toward a versatile algorithm capable of learning control policies under a weaker reset assumption.

show abstract