2021
DOI: 10.48550/arxiv.2102.12957
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Credit Assignment with Meta-Policy Gradient for Multi-Agent Reinforcement Learning

Jianzhun Shao,
Hongchang Zhang,
Yuhang Jiang
et al.

Abstract: Reward decomposition is a critical problem in centralized training with decentralized execution (CTDE) paradigm for multi-agent reinforcement learning. To take full advantage of global information, which exploits the states from all agents and the related environment for decomposing Q values into individual credits, we propose a general meta-learning-based Mixing Network with Meta Policy Gradient (MNMPG) framework to distill the global hierarchy for delicate reward decomposition. The excitation signal for lear… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 19 publications
0
2
0
Order By: Relevance
“…While some researchers [11] pay more attention to coordinated relationship among agents, inventing coordination knowledge transfer that has better generalization and scalability. Besides, Shao et al [40] prefer to the mechanism of self-improvement, which uses no prior information. However, the challenge in the logic criterion of CA is the prior knowledge cannot be presented clearly and applied to reinforcement learning, or it is not always useful for every multi-agent system.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…While some researchers [11] pay more attention to coordinated relationship among agents, inventing coordination knowledge transfer that has better generalization and scalability. Besides, Shao et al [40] prefer to the mechanism of self-improvement, which uses no prior information. However, the challenge in the logic criterion of CA is the prior knowledge cannot be presented clearly and applied to reinforcement learning, or it is not always useful for every multi-agent system.…”
Section: Discussionmentioning
confidence: 99%
“…In this module, the weights of each layer are generated by using a hyper network with absolute value calculations so that the integration of local Q-value satisfies the monotonicity constraint and the use of global information is greater and more flexible to estimate the Q-value of joint actions, improving the learning and convergence of the global Q-value. Shao et al [40] propose Mixing Network with Meta Policy Gradient (MNMPG), which assign proper credit to each agent using a global hierarchy with meta policy gradient. Zhou et al [11] propose a level-adaptive QTransformer (LA-QTransformer) that is a novel mixing network with a multi-head attention module to combine all coordination patterns and generate the credit assignment weights.…”
Section: Mixing Networkmentioning
confidence: 99%