Towards Hierarchical Task Decomposition using Deep Reinforcement Learning for Pick and Place Subtasks

Marzari, Luca; Pore, Ameya; Dall’Alba, Diego; Aragon-Camarasa, Gerardo; Farinelli, Alessandro; Fiorini, Paolo

doi:10.1109/icar53236.2021.9659344

Cited by 25 publications

(26 citation statements)

References 24 publications

(29 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hierarchical reinforcement learning HRL [13,14] divides agents' tasks into sub-tasks to be learned by different agents. This simplifies the problem to be solved by each agent, making their behaviour easier to interpret and thereby making them easier to characterize.…”

Section: Agent Characterizationmentioning

confidence: 99%

Reinforcement Learning Your Way: Agent Characterization through Policy Regularization

Maree,

Omlin

2022

Preprint

View full text Add to dashboard Cite

The increased complexity of state-of-the-art reinforcement learning (RL) algorithms have resulted in an opacity that inhibits explainability and understanding. This has led to the development of several post-hoc explainability methods that aim to extract information from learned policies thus aiding explainability. These methods rely on empirical observations of the policy and thus aim to generalize a characterization of agents' behaviour. In this study, we have instead developed a method to imbue a characteristic behaviour into agents' policies through regularization of their objective functions. Our method guides the agents' behaviour during learning which results in an intrinsic characterization; it connects the learning process with model explanation. We provide a formal argument and empirical evidence for the viability of our method. In future work, we intend to employ it to develop agents that optimize individual financial customers' investment portfolios based on their spending personalities.

show abstract

Section: Agent Characterizationmentioning

confidence: 99%

Reinforcement Learning Your Way: Agent Characterization through Policy Regularization

Maree,

Omlin

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…As a remedy the broad area of Hierarchical Reinforcement Learning (HRL) attempts to decompose RL problems into multiple levels of abstraction-temporal, spatial, or otherwise. Many works deploy separate policies over different time horizons and action spaces (Barto & Mahadevan, 2003;Levy et al, 2017;Yang et al, 2020;Marzari et al, 2021). Temporal abstraction in planning can be traced back at least to Sutton et al (1999), where the options were introduced to refer to lower level policies.…”

Section: Related Workmentioning

confidence: 99%

Exploring with Sticky Mittens: Reinforcement Learning with Expert Interventions via Option Templates

Dutta¹,

Sridhar²,

Bastani³

et al. 2022

Preprint

View full text Add to dashboard Cite

Environments with sparse rewards and long horizons pose a significant challenge for current reinforcement learning algorithms. A key feature enabling humans to learn challenging control tasks is that they often receive expert intervention that enables them to understand the high-level structure of the task before mastering low-level control actions. We propose a framework for leveraging expert intervention to solve long-horizon reinforcement learning tasks. We consider option templates, which are specifications encoding a potential option that can be trained using reinforcement learning. We formulate expert intervention as allowing the agent to execute option templates before learning an implementation. This enables them to use an option, before committing costly resources to learning it. We evaluate our approach on three challenging reinforcement learning problems, showing that it outperforms state-of-the-art approaches by an order of magnitude. Project website: https://sites.google.com/view/stickymittens In order to apply RL effectively, to practical applications with a high-dimensional state and action space, exploration is a challenge. Intuitively, in such settings, complex sequences of actions are required to achieve any nonzero reward, which means that random exploration will take extremely long to find a nonzero reward signal. Thus, learning to improve performance can be very slow.Options are an RL tool to circumvent this problem (Sutton et al., 1999). Options are policies designed to achieve intermediate subgoals. For instance, in robot grasping tasks, an option might enable the robot to grasp a block, which

show abstract

“…Larger problems are solved by choreographing these subtasks through an orchestration agent that learns the highlevel dynamics of its environment [25]. HRL has, for instance, been used to control a robotic arm: while low-level agents learned simple tasks such as moving forward / backward or picking up / placing down, an orchestration agent learned to retrieve objects on a surface by choreographing these tasks [26,27]. The agents were not only efficient at learning, but their policies were more easily interpreted by human experts.…”

Section: Background and Related Workmentioning

confidence: 99%

Reinforcement Learning with Intrinsic Affinity for Personalized Prosperity Management

Maree¹,

Omlin²

2022

Preprint

View full text Add to dashboard Cite

The common purpose of applying reinforcement learning (RL) to asset management is the maximization of profit. The extrinsic reward function used to learn an optimal strategy typically does not take into account any other preferences or constraints. We have developed a regularization method that ensures that strategies have global intrinsic affinities, i.e., different personalities may have preferences for certain assets which may change over time. We capitalize on these intrinsic policy affinities to make our RL model inherently interpretable. We demonstrate how RL agents can be trained to orchestrate such individual policies for particular personality profiles and still achieve high returns.

show abstract

Towards Hierarchical Task Decomposition using Deep Reinforcement Learning for Pick and Place Subtasks

Cited by 25 publications

References 24 publications

Reinforcement Learning Your Way: Agent Characterization through Policy Regularization

Reinforcement Learning Your Way: Agent Characterization through Policy Regularization

Exploring with Sticky Mittens: Reinforcement Learning with Expert Interventions via Option Templates

Reinforcement Learning with Intrinsic Affinity for Personalized Prosperity Management

Contact Info

Product

Resources

About