2021
DOI: 10.48550/arxiv.2102.10616
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dealing with Non-Stationarity in MARL via Trust-Region Decomposition

Abstract: Non-stationarity is one thorny issue in multi-agent reinforcement learning, which is caused by the policy changes of agents during the learning procedure. Current works to solve this problem have their own limitations in effectiveness and scalability, such as centralized critic and decentralized actor (CCDA), population-based self-play, modeling of others and etc. In this paper, we novelly introduce a δ-stationarity measurement to explicitly model the stationarity of a policy sequence, which is theoretically p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 17 publications
0
4
0
Order By: Relevance
“…Although MATRL still needs knowledge about other agents' policies in adjusting the step size during training, it does not need centralized critics or any communication channels. Besides, [70,71] attempted to apply trust-region methods in networked multi-agent settings by conducting consensus optimization with their neighbors. Instead takes a game-theoretical approach to compute the meta-game Nash to find policy improvement directions without networked assumption.…”
Section: Related Workmentioning
confidence: 99%
“…Although MATRL still needs knowledge about other agents' policies in adjusting the step size during training, it does not need centralized critics or any communication channels. Besides, [70,71] attempted to apply trust-region methods in networked multi-agent settings by conducting consensus optimization with their neighbors. Instead takes a game-theoretical approach to compute the meta-game Nash to find policy improvement directions without networked assumption.…”
Section: Related Workmentioning
confidence: 99%
“…The third category includes intentions into messages and allow agents to negotiate during communication, reducing the non-stationarity caused by the uncertainty of agents' policies. Some non-communication methods solve this problem by modelling others, i.e., predicting others' behaviours or goals (Raileanu et al 2018;Rabinowitz et al 2018;Li et al 2021b). By contrast, communication allows agents to share their intentions directly with others instead of inference.…”
Section: Related Workmentioning
confidence: 99%
“…Previous studies mainly focus on learning when and whom to share local observations, improving coordination compared to independent learning (Sukhbaatar, Szlam, and Fergus 2016;Jiang and Lu 2018;Das et al 2019). However, the learning suffers from the environment non-stationarity, which is caused by the changing of agents' policies during learning procedure (Li et al 2021b). Modelling others has been put forward to alleviate the non-stationarity by predicting others' behaviours or goals (Raileanu et al 2018;Rabinowitz et al 2018).…”
Section: Introductionmentioning
confidence: 99%
“…can be used to train agent strategy models. Strong coupling multi-agent reinforcement learning refers to the establishment of multi-agent solving paradigms by establishing global observations [9] , factorized reward functions [10] , or global consistent loss functions [11] , and solving corresponding problems based on deep neural networks or game theory. At present, weak coupling reinforcement learning makes it difficult to achieve model convergence in dynamically changing scenarios due to nonstationary assumptions; Strong coupling reinforcement learning presents different shortcomings due to different coupling methods.…”
Section: Introductionmentioning
confidence: 99%