Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

Han, Dongge; Boehmer, Wendelin; Wooldridge, Michael; Rogers, Alex

doi:10.1007/978-3-030-29911-8_7

Cited by 11 publications

(5 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the specific case of multi-agent systems, after a pioneering work (Makar et al, 2001), others have explored masterslave architectures (Kong et al, 2017), feudal multi-agent hierarchies (Ahilan & Dayan, 2019), temporal abstraction (Tang et al, 2018), dynamic termination (Han et al, 2019) and skill discovery (Yang et al, 2019). The field of planning on decentralised partially observable Markov decision processes (Oliehoek & Amato, 2016) has also seen work leveraging macro-actions (Amato et al, 2019).…”

Section: Hierarchical Reinforcement Learningmentioning

confidence: 99%

Hierarchically Structured Scheduling and Execution of Tasks in a Multi-Agent Environment

Carvalho¹,

Sengupta²

2022

Preprint

View full text Add to dashboard Cite

In a warehouse environment, tasks appear dynamically. Consequently, a task management system that matches them with the workforce too early (e.g., weeks in advance) is necessarily sub-optimal. Also, the rapidly increasing size of the action space of such a system consists of a significant problem for traditional schedulers. Reinforcement learning, however, is suited to deal with issues requiring making sequential decisions towards a long-term, often remote, goal. In this work, we set ourselves on a problem that presents itself with a hierarchical structure: the task-scheduling, by a centralised agent, in a dynamic warehouse multi-agent environment and the execution of one such schedule, by decentralised agents with only partial observability thereof. We propose to use deep reinforcement learning to solve both the high-level scheduling problem and the low-level multi-agent problem of schedule execution. Finally, we also conceive the case where centralisation is impossible at test time and workers must learn how to cooperate in executing the tasks in an environment with no schedule and only partial observability.

show abstract

Section: Hierarchical Reinforcement Learningmentioning

confidence: 99%

Hierarchically Structured Scheduling and Execution of Tasks in a Multi-Agent Environment

Carvalho¹,

Sengupta²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…However, despite the advantages brought by using options, due to the temporally-extended nature of options, agents' responses can be inconsistent when the environment or other agents' behaviour change. To tackle this problem, Han et al (2019) proposed a dynamical termination scheme which allows an agent to flexibly terminate its current option. Although both option-critic and our approach use a pool of actors, while in the former case actors model options, in the latter one actors model policies, preventing agents' inconsistent behaviours.…”

Section: Related Workmentioning

confidence: 99%

Coordinating Policies Among Multiple Agents via an Intelligent Communication Channel

Liu¹,

Shah²,

Boussif³

et al. 2022

Preprint

View full text Add to dashboard Cite

In Multi-Agent Reinforcement Learning (MARL), specialized channels are often introduced that allow agents to communicate directly with one another. In this paper, we propose an alternative approach whereby agents communicate through an intelligent facilitator that learns to sift through and interpret signals provided by all agents to improve the agents' collective performance. To ensure that this facilitator does not become a centralized controller, agents are incentivized to reduce their dependence on the messages it conveys, and the messages can only influence the selection of a policy from a fixed set, not instantaneous actions given the policy. We demonstrate the strength of this architecture over existing baselines on several cooperative MARL environments.Preprint. Under review.

show abstract

“…Recent work, converts the MARL problem to a single agent setting by using a single Q-function across all agents [Lowe et al 2017]. Other recent work has begun to combine MARL and HRL but is limited to simple discrete grid environments, uses additional methods to stabilize the optimization, and includes communication [Han et al 2019;Tang et al 2018]. Instead, our work tackles multi-agent articulated humanoid simulation by applying a combination of goal conditioned learning and partial parameter sharing by assuming all agents share task-agnostic locomotion and optimize similar goals which allows us to keep the modularity and autonomy benefits of decentralized methods while significantly reducing the model size.…”

Section: Multi-agent Reinforcement Learningmentioning

confidence: 99%

Deep Integration of Physical Humanoid Control and Crowd Navigation

Haworth

Berseth

Moon

et al. 2020

Motion, Interaction and Games

View full text Add to dashboard Cite

Many multi-agent navigation approaches make use of simplified representations such as a disk. These simplifications allow for fast simulation of thousands of agents but limit the simulation accuracy and fidelity. In this paper, we propose a fully integrated physical character control and multi-agent navigation method. In place of sample complex online planning methods, we extend the use of recent deep reinforcement learning techniques. This extension improves on multi-agent navigation models and simulated humanoids by combining Multi-Agent and Hierarchical Reinforcement Learning. We train a single short term goal-conditioned low-level policy to provide directed walking behaviour. This task-agnostic controller can be shared by higher-level policies that perform longer-term planning. The proposed approach produces reciprocal collision avoidance, robust navigation, and emergent crowd behaviours. Furthermore, it offers several key affordances not previously possible in multi-agent navigation including tunable character morphology and physically accurate interactions with agents and the environment. Our results show that the proposed method outperforms prior methods across environments and tasks, as well as, performing well in terms of zero-shot generalization over different numbers of agents and computation time.

show abstract

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

Cited by 11 publications

References 22 publications

Hierarchically Structured Scheduling and Execution of Tasks in a Multi-Agent Environment

Hierarchically Structured Scheduling and Execution of Tasks in a Multi-Agent Environment

Coordinating Policies Among Multiple Agents via an Intelligent Communication Channel

Deep Integration of Physical Humanoid Control and Crowd Navigation

Contact Info

Product

Resources

About