2020
DOI: 10.48550/arxiv.2009.04197
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 8 publications
0
3
0
Order By: Relevance
“…Brief Description IQL (Tampuu et al, 2017) Independent Q-learning VDN (Sunehag et al, 2017) Value decomposition network COMA (Foerster et al, 2017) Counterfactual Actor-critic QMIX (Rashid et al, 2018) Monotonicity Value decomposition QTRAN (Son et al, 2019) Value decomposition with linear affine transform MAVEN (Mahajan et al, 2019) MARL with variational method for exploration QR-MIX (Hu et al, 2020) MARL with Centralized Distributional Q LH-IQN (Lyu & Amato, 2020) Likelihood Hysteretic with IQN (independent learning) Qatten (Yang et al, 2020) Multi-head Attention for the estimation of the Q tot…”
Section: Training Detailsmentioning
confidence: 99%
See 2 more Smart Citations
“…Brief Description IQL (Tampuu et al, 2017) Independent Q-learning VDN (Sunehag et al, 2017) Value decomposition network COMA (Foerster et al, 2017) Counterfactual Actor-critic QMIX (Rashid et al, 2018) Monotonicity Value decomposition QTRAN (Son et al, 2019) Value decomposition with linear affine transform MAVEN (Mahajan et al, 2019) MARL with variational method for exploration QR-MIX (Hu et al, 2020) MARL with Centralized Distributional Q LH-IQN (Lyu & Amato, 2020) Likelihood Hysteretic with IQN (independent learning) Qatten (Yang et al, 2020) Multi-head Attention for the estimation of the Q tot…”
Section: Training Detailsmentioning
confidence: 99%
“…Therefore, instead of expected values, learning distributions of future returns, i.e., Q values, are more useful for agents to make decisions. Recently, QR-MIX (Hu et al, 2020) decomposes the estimated joint return distribution (Belle-mare et al, 2017;Dabney et al, 2018a) into individual Q values. However, the policies in QR-MIX are still individual Q values.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation