2019
DOI: 10.1007/978-3-030-29911-8_7
|View full text |Cite
|
Sign up to set email alerts
|

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

Abstract: In a multi-agent system, an agent's optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical reinforcement learning framework. However, this approach results in inflexibility of agents if options have a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 22 publications
0
5
0
Order By: Relevance
“…In the specific case of multi-agent systems, after a pioneering work (Makar et al, 2001), others have explored masterslave architectures (Kong et al, 2017), feudal multi-agent hierarchies (Ahilan & Dayan, 2019), temporal abstraction (Tang et al, 2018), dynamic termination (Han et al, 2019) and skill discovery (Yang et al, 2019). The field of planning on decentralised partially observable Markov decision processes (Oliehoek & Amato, 2016) has also seen work leveraging macro-actions (Amato et al, 2019).…”
Section: Hierarchical Reinforcement Learningmentioning
confidence: 99%
“…In the specific case of multi-agent systems, after a pioneering work (Makar et al, 2001), others have explored masterslave architectures (Kong et al, 2017), feudal multi-agent hierarchies (Ahilan & Dayan, 2019), temporal abstraction (Tang et al, 2018), dynamic termination (Han et al, 2019) and skill discovery (Yang et al, 2019). The field of planning on decentralised partially observable Markov decision processes (Oliehoek & Amato, 2016) has also seen work leveraging macro-actions (Amato et al, 2019).…”
Section: Hierarchical Reinforcement Learningmentioning
confidence: 99%
“…However, despite the advantages brought by using options, due to the temporally-extended nature of options, agents' responses can be inconsistent when the environment or other agents' behaviour change. To tackle this problem, Han et al (2019) proposed a dynamical termination scheme which allows an agent to flexibly terminate its current option. Although both option-critic and our approach use a pool of actors, while in the former case actors model options, in the latter one actors model policies, preventing agents' inconsistent behaviours.…”
Section: Related Workmentioning
confidence: 99%
“…Recent work, converts the MARL problem to a single agent setting by using a single Q-function across all agents [Lowe et al 2017]. Other recent work has begun to combine MARL and HRL but is limited to simple discrete grid environments, uses additional methods to stabilize the optimization, and includes communication [Han et al 2019;Tang et al 2018]. Instead, our work tackles multi-agent articulated humanoid simulation by applying a combination of goal conditioned learning and partial parameter sharing by assuming all agents share task-agnostic locomotion and optimize similar goals which allows us to keep the modularity and autonomy benefits of decentralized methods while significantly reducing the model size.…”
Section: Multi-agent Reinforcement Learningmentioning
confidence: 99%