QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

Hu, Jian; Harding, Seth Austin; Wu, Haibin; Hu, Siyue; Liao, Shih-Wei

doi:10.48550/arxiv.2009.04197

Cited by 2 publications

(3 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Brief Description IQL (Tampuu et al, 2017) Independent Q-learning VDN (Sunehag et al, 2017) Value decomposition network COMA (Foerster et al, 2017) Counterfactual Actor-critic QMIX (Rashid et al, 2018) Monotonicity Value decomposition QTRAN (Son et al, 2019) Value decomposition with linear affine transform MAVEN (Mahajan et al, 2019) MARL with variational method for exploration QR-MIX (Hu et al, 2020) MARL with Centralized Distributional Q LH-IQN (Lyu & Amato, 2020) Likelihood Hysteretic with IQN (independent learning) Qatten (Yang et al, 2020) Multi-head Attention for the estimation of the Q tot…”

Section: Training Detailsmentioning

confidence: 99%

“…Therefore, instead of expected values, learning distributions of future returns, i.e., Q values, are more useful for agents to make decisions. Recently, QR-MIX (Hu et al, 2020) decomposes the estimated joint return distribution (Belle-mare et al, 2017;Dabney et al, 2018a) into individual Q values. However, the policies in QR-MIX are still individual Q values.…”

Section: Introductionmentioning

confidence: 99%

“…CTDE (Oliehoek et al, 2008) has drawn enormous attention via training policies of each agent with access to global trajectories in a centralized way and executing actions given only the local observations of each agent in a decentralized way. However, current MARL methods (Lowe et al, 2017;Foerster et al, 2017;Sunehag et al, 2017;Rashid et al, 2018;Son et al, 2019;Hu et al, 2020) neglect the limited representation of agent values, thus failing to consider the problem of random cost underlying the nonstationarity of the environment, a.k.a risk-sensitive learning. Recent advances in distributional RL (Bellemare et al, 2017;Dabney et al, 2018b) focus on learning distribution over returns.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

Qiu¹,

Wang²,

Rong³

et al. 2021

Preprint

View full text Add to dashboard Cite

Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE). However, such expected, i.e., risk-neutral, Q value is not sufficient even with CTDE due to the randomness of rewards and the uncertainty in environments, which causes the failure of these methods to train coordinating agents in complex environments. To address these issues, we propose RMIX, a novel cooperative MARL method with the Conditional Value at Risk (CVaR) measure over the learned distributions of individuals' Q values. Specifically, we first learn the return distributions of individuals to analytically calculate CVaR for decentralized execution. Then, to handle the temporal nature of the stochastic outcomes during executions, we propose a dynamic risk level predictor for risk level tuning. Finally, we optimize the CVaR policies with CVaR values used to estimate the target in TD error during centralized training and the CVaR values are used as auxiliary local rewards to update the local distribution via Quantile Regression loss. Empirically, we show that our method significantly outperforms state-of-the-art methods on challenging StarCraft II tasks, demonstrating enhanced coordination and improved sample efficiency.

show abstract

Section: Training Detailsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

Qiu¹,

Wang²,

Rong³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

Exploration in Deep Reinforcement Learning: A Comprehensive Survey

Yang¹,

Tang²,

Bai³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning (MARL) have achieved significant success across a wide range of domains, including game AI, autonomous vehicles, robotics, finance, healthcare, transportation and so on. However, DRL and deep MARL agents are widely known to be sample-inefficient and millions of interactions are usually needed even for relatively simple game settings, thus preventing the wide application and deployment in real-industry scenarios. One bottleneck challenge behind is the well-known exploration problem, i.e., how to efficiently explore the unknown environments and collect informative experiences that could benefit the policy learning most towards optimal ones.

show abstract

QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

Cited by 2 publications

References 8 publications

RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

Exploration in Deep Reinforcement Learning: A Comprehensive Survey

Contact Info

Product

Resources

About