2018
DOI: 10.48550/arxiv.1809.09332
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction

Abstract: Multiagent reinforcement learning (MARL) is commonly considered to suffer from non-stationary environments and exponentially increasing policy space. It would be even more challenging when rewards are sparse and delayed over long trajectories. In this paper, we study hierarchical deep MARL in cooperative multiagent problems with sparse and delayed reward. With temporal abstraction, we decompose the problem into a hierarchy of different time scales and investigate how agents can learn high-level coordination ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(18 citation statements)
references
References 13 publications
0
12
0
Order By: Relevance
“…The MAXQ algorithm was designed to provide for a hierarchical break-down of the reinforcement learning problem by decomposing the value function for the main problem into a set of value functions for the sub-problems [5]. Tang et al [51] used temporal abstraction to let agents learn high-level coordination and independent skills at different temporal scales together. Kumar et al [16] presented another framework benefiting from temporal abstractions to achieve coordination among agents with reduced communication complexities.…”
Section: Relevant Prior Workmentioning
confidence: 99%
“…The MAXQ algorithm was designed to provide for a hierarchical break-down of the reinforcement learning problem by decomposing the value function for the main problem into a set of value functions for the sub-problems [5]. Tang et al [51] used temporal abstraction to let agents learn high-level coordination and independent skills at different temporal scales together. Kumar et al [16] presented another framework benefiting from temporal abstractions to achieve coordination among agents with reduced communication complexities.…”
Section: Relevant Prior Workmentioning
confidence: 99%
“…In [140] a cooperative problem with sparse and delayed rewards is considered, in which each agent accesses a local observation, takes a local action, and submit the joint action into the environment to get the local rewards. Each agent has some low-level and high-level actions to take such that the problem of the task selection for each agent can be modeled as a hierarchical RL problem.…”
Section: Emerging Areasmentioning
confidence: 99%
“…It basically decomposes a multiagent problem into a collection of simultaneous single-agent problems that share the same environment. Even when this approach does not address the non-stationarity problem introduced by the changing policies of the other agents, it nonetheless commonly serves as a strong benchmark for a range of MAS [31,33,27].…”
Section: Definition 2 (Dfa) a Deterministic Finite Automaton Is A Tuplementioning
confidence: 99%