2021
DOI: 10.48550/arxiv.2112.04454
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Greedy-based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning

Abstract: Due to the representation limitation of the joint Q value function, multi-agent reinforcement learning (MARL) methods with linear or monotonic value decomposition suffer from the relative overgeneralization. As a result, they can not ensure the optimal coordination. Existing methods address the relative overgeneralization by achieving complete expressiveness or learning a bias, which is insufficient to solve the problem. In this paper, we propose the optimal consistency, a criterion to evaluate the optimality … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 8 publications
0
2
0
Order By: Relevance
“…To address this problem, QPLEX [25] and QTRAN [22] aim to learn value functions with complete expressiveness capacity. However, reports are that they perform poorly when being used in practice [5,24]. This is because learning the complete expressiveness is impractical in complicated MARL tasks due to the challenging exploration in large joint action spaces.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…To address this problem, QPLEX [25] and QTRAN [22] aim to learn value functions with complete expressiveness capacity. However, reports are that they perform poorly when being used in practice [5,24]. This is because learning the complete expressiveness is impractical in complicated MARL tasks due to the challenging exploration in large joint action spaces.…”
Section: Related Workmentioning
confidence: 99%
“…6.4.1 Comparison among Using Different 𝑧 Dimensions. We compare the performance of AVGM using different 𝑧 dimensions (8,16,24,32,40) in pursuit. The results are shown in Figure 5.…”
Section: Ablationsmentioning
confidence: 99%