2016
DOI: 10.1016/j.neucom.2016.01.031
|View full text |Cite
|
Sign up to set email alerts
|

Multi-agent reinforcement learning as a rehearsal for decentralized planning

Abstract: Decentralized partially-observable Markov decision processes (Dec-POMDPs) are a powerful tool for modeling multi-agent planning and decision-making under uncertainty. Prevalent Dec-POMDP solution techniques require centralized computation given full knowledge of the underlying model. Multi-agent reinforcement learning (MARL) based approaches have been recently proposed for distributed solution of Dec-POMDPs without full prior knowledge of the model, but these methods assume that conditions during learning and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
126
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 267 publications
(126 citation statements)
references
References 9 publications
0
126
0
Order By: Relevance
“…Reinforcement learning as a rehearsal (RLaR) (Kraemer & Banerjee, 2016) is a related approach, where rather than a complete demonstration, an external entity (which could be a human) informs the learning agents about the parts of their state spaces that are hidden from their view, enabling them to perform RL as a rehearsal, but the agents must learn policies that do not rely on the hidden parts of their states. In da Silva et al (2017), similar online feedback is exchanged among the agents themselves, but in this work we seek to limit any advice to be an off-line prior, thus limiting the need for communication during learning.…”
Section: Related Workmentioning
confidence: 99%
“…Reinforcement learning as a rehearsal (RLaR) (Kraemer & Banerjee, 2016) is a related approach, where rather than a complete demonstration, an external entity (which could be a human) informs the learning agents about the parts of their state spaces that are hidden from their view, enabling them to perform RL as a rehearsal, but the agents must learn policies that do not rely on the hidden parts of their states. In da Silva et al (2017), similar online feedback is exchanged among the agents themselves, but in this work we seek to limit any advice to be an off-line prior, thus limiting the need for communication during learning.…”
Section: Related Workmentioning
confidence: 99%
“…The model involving the centralized training of decentralized policies has created considerable demand in the efficient training of multiple agents [19] [29]. This model can address the challenge of non-Markovian and nonstationary environments during learning [15] and can access the additional state information of other agents while promoting communication [31].…”
Section: Introductionmentioning
confidence: 99%
“…As a result, multiple agents perform RL individually. However, a distributed architecture suffers from the moving target problem [24], where the behavior of each agent can impact on behaviors of other agents. On the contrary, the centralized architecture used in this paper assumes one agent only controlling all cells in the mobile network.…”
Section: Introduction a Backgroundmentioning
confidence: 99%