2021
DOI: 10.48550/arxiv.2112.08702
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning to Share in Multi-Agent Reinforcement Learning

Abstract: In this paper, we study the problem of networked multi-agent reinforcement learning (MARL), where a number of agents are deployed as a partially connected network and each interacts only with nearby agents. Networked MARL requires all agents make decision in a decentralized manner to optimize a global objective with restricted communication between neighbors over the network. Inspired by the fact that sharing plays a key role in human's learning of cooperation, we propose LToS, a hierarchically decentralized M… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 9 publications
0
5
0
Order By: Relevance
“…We selected the aforementioned role learning algorithm, ROMA (Wang et al, 2020), and sanction-based algorithm, LIO (Yang et al, 2020), as the baselines because our RESVO draws on the core ideas of them in the SVO-based role emergence and the role-based policy optimization 6 respectively. In addition, we selected several aforementioned SVO-based algorithms, including LToS (Yi et al, 2021) and D3C (Gemp et al, 2022) as baselines. Below we will briefly introduce the core idea of each algorithm again.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…We selected the aforementioned role learning algorithm, ROMA (Wang et al, 2020), and sanction-based algorithm, LIO (Yang et al, 2020), as the baselines because our RESVO draws on the core ideas of them in the SVO-based role emergence and the role-based policy optimization 6 respectively. In addition, we selected several aforementioned SVO-based algorithms, including LToS (Yi et al, 2021) and D3C (Gemp et al, 2022) as baselines. Below we will briefly introduce the core idea of each algorithm again.…”
Section: Methodsmentioning
confidence: 99%
“…A reward-sharing agent i learns a individual orientation function, w i ηi : O i × A −i → R N , parameterized by η i , that maps its own observation o i and all other agents' actions a −i to a vector of reward-sharing ratios for all N agents. Unlike the existing methods based on SVO (Peysakhovich and Lerer, 2018b;Baker, 2020;Gemp et al, 2022;Yi et al, 2021) or saction (Koster et al, 2020;Lupu and Precup, 2020;Yang et al, 2020;Vinitsky et al, 2021;Dong et al, 2021) mechanism, the orientation function in RESVO (1) allows agents to reward itself 3 , and (2) the sum of all sharing ratios does not need to be equal to 1. This is one of the reasons why reward sharing, a mechanism used by existing work, can encourage the division of labor and solve ISD.…”
Section: Svo-based Role Emergencementioning
confidence: 99%
See 3 more Smart Citations