Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

Plappert, Matthias; Andrychowicz, Marcin; Ray, Alex; Baker, Bowen; Powell, G.; Schneider, Jonas; Tobin, Joshua W.D.; Chociej, Maciek; Welinder, Peter; Kumar, Vikas; Zaremba, Wojciech

doi:10.48550/arxiv.1802.09464

Cited by 110 publications

(213 citation statements)

References 14 publications

Supporting

Mentioning

211

Contrasting

Order By: Relevance

“…We represent the same problem setup, that of multiple tasks with multiple goals per task, in a robotic continuous space environment. We choose to adapt the FetchReach-v0 environment from Plappert et al (2018) in order to train an agent to move a robotic gripper close to a set of target positions in the correct order. We represent all the multiple goal positions in the input space by 3D coordinates, sampled around the gripper starting position.…”

Section: A25 Fetchreach Experiments: Implementation Detailsmentioning

confidence: 99%

The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning

Nica¹,

Khetarpal²,

Precup³

2022

Preprint

View full text Add to dashboard Cite

Decision-making AI agents are often faced with two important challenges: the depth of the planning horizon, and the branching factor due to having many choices. Hierarchical reinforcement learning methods aim to solve the first problem, by providing shortcuts that skip over multiple time steps. To cope with the breadth, it is desirable to restrict the agent's attention at each step to a reasonable number of possible choices. The concept of affordances (Gibson, 1977) suggests that only certain actions are feasible in certain states. In this work, we model "affordances" through an attention mechanism that limits the available choices of temporally extended options. We present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options. We investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices. We identify and empirically illustrate the settings in which the paradox of choice arises, i.e. when having fewer but more meaningful choices improves the learning speed and performance of a reinforcement learning agent.

show abstract

Section: A25 Fetchreach Experiments: Implementation Detailsmentioning

confidence: 99%

The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning

Nica¹,

Khetarpal²,

Precup³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…This domain is adapted from the well known gym robotics FetchPickAndPlace-v0 environment [38]. The following modifications were made: 1) 3 additional blocks were introduced, with different colours, and a goal pad, 2) object spawn locations were not randomized and were instantiated equidistantly around the goal pad, see Fig.…”

Section: Environmentsmentioning

confidence: 99%

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

Salter¹,

Hartikainen²,

Goodwin³

et al. 2022

Preprint

View full text Add to dashboard Cite

The ability to discover behaviours from past experience and transfer them to new tasks is a hallmark of intelligent agents acting sample-efficiently in the real world. Equipping embodied reinforcement learners with the same ability may be crucial for their successful deployment in robotics. While hierarchical and KL-regularized RL individually hold promise here, arguably a hybrid approach could combine their respective benefits. Key to these fields is the use of information asymmetry to bias which skills are learnt. While asymmetric choice has a large influence on transferability, prior works have explored a narrow range of asymmetries, primarily motivated by intuition. In this paper, we theoretically and empirically show the crucial trade-off, controlled by information asymmetry, between the expressivity and transferability of skills across sequential tasks. Given this insight, we provide a principled approach towards choosing asymmetry and apply our approach to a complex, robotic block stacking domain, unsolvable by baselines, demonstrating the effectiveness of hierarchical KL-regularized RL, coupled with correct asymmetric choice, for sample-efficient transfer learning.

show abstract

“…Prior works have shown that a standard off-policy algorithm DDPG [6] combined with an implicit curriculum method HER [7] can learn dexterous manipulation policies to control an object with simple geometries, such as a cube or an egg [8]. However, whether a single policy can work well on a large number of geometrically-diverse objects has been under-explored.…”

Section: Geometry-aware Multi-task Learningmentioning

confidence: 99%

“…Because the object geometries are vastly different from each other, leading to different levels of difficulties for the manipulation policies, a random split may not ensure fair evaluations. Therefore, we use the same DDPG + HER algorithm to train an oracle single-task RL policy for each object, following the setup from [8]. Then we split the objects according to the success rate of its oracle, ensuring that the training and held-out objects have similar difficulties on average.…”

Section: B Train/test Splitmentioning

confidence: 99%

“…c) Reward: At timestep t, the agent receives a binary reward r t of 1 if the angle between current object orientation and goal orientation is within 0.1 radians and 0 otherwise. d) Environment Initialization and Goal Selection: Following the environment design in [8], initial position of object is set to be above the palm to avoid penetration with the hand and further perturbed with a Gaussian noise sampled from N (0, 5e−5). The initial and goal orientation are sampled independently and randomly about the z-axis for each episode.…”

Section: Environment Details A) State Spacementioning

confidence: 99%

See 1 more Smart Citation

Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning

Huang¹,

Mordatch²,

Abbeel³

et al. 2021

Preprint

View full text Add to dashboard Cite

Dexterous manipulation of arbitrary objects, a fundamental daily task for humans, has been a grand challenge for autonomous robotic systems. Although data-driven approaches using reinforcement learning can develop specialist policies that discover behaviors to control a single object, they often exhibit poor generalization to unseen ones. In this work, we show that policies learned by existing reinforcement learning algorithms can in fact be generalist when combined with multi-task learning and a well-chosen object representation. We show that a single generalist policy can perform in-hand manipulation of over 100 geometrically-diverse realworld objects and generalize to new objects with unseen shape or size. Interestingly, we find that multi-task learning with object point cloud representations not only generalizes better but even outperforms the single-object specialist policies on both training as well as held-out test objects. Video results at https://huangwl18.github.io/geometry-dex.• We show that a multi-task policy trained on many objects can match the performance of the oracles, i.e. policies trained on those objects individually. • We present a simple representation encoder that not only

show abstract

Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

Cited by 110 publications

References 14 publications

The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning

The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning

Contact Info

Product

Resources

About