Counterfactual Multi-Agent Policy Gradients

Foerster, Jakob; Farquhar, Gregory; Afouras, Triantafyllos; Nardelli, Nantas; Whiteson, Shimon

doi:10.1609/aaai.v32i1.11794

Cited by 952 publications

(373 citation statements)

References 34 publications

Supporting

Mentioning

373

Contrasting

Order By: Relevance

“…When a high-fidelity simulator is available, this may not be problematic and in this case one may also consider improving performance by applying centralised training with decentralised execution in the online phase (e.g. [74,75]). Alternatively, for rapid adaptation, (theoretical or empirical) demonstrations of sample complexity are required rather than the asymptotic convergence guarantees for model-free MARL (e.g.…”

Section: Multi-agent Reinforcement Learningmentioning

confidence: 99%

Resilient Robot Teams: a Review Integrating Decentralised Control, Change-Detection, and Learning

2022

View full text Add to dashboard Cite

Purpose of Review This paper reviews opportunities and challenges for decentralised control, change-detection, and learning in the context of resilient robot teams. Recent Findings Exogenous fault-detection methods can provide a generic detection or a specific diagnosis with a recovery solution. Robot teams can perform active and distributed sensing for detecting changes in the environment, including identifying and tracking dynamic anomalies, as well as collaboratively mapping dynamic environments. Resilient methods for decentralised control have been developed in learning perception-action-communication loops, multi-agent reinforcement learning, embodied evolution, offline evolution with online adaptation, explicit task allocation, and stigmergy in swarm robotics. Summary Remaining challenges for resilient robot teams are integrating change-detection and trial-and-error learning methods, obtaining reliable performance evaluations under constrained evaluation time, improving the safety of resilient robot teams, theoretical results demonstrating rapid adaptation to given environmental perturbations, and designing realistic and compelling case studies.

show abstract

Section: Multi-agent Reinforcement Learningmentioning

confidence: 99%

Resilient Robot Teams: a Review Integrating Decentralised Control, Change-Detection, and Learning

2022

View full text Add to dashboard Cite

show abstract

“…The goal of MARL is to derive decentralized policies for agents and impose a consensus to conduct a collaborative task. To achieve this, the multi-agent deep deterministic policy gradient (MADDPG) [22] and counterfactual multi-agent (COMA) [23] construct a centralized critic to train decentralized actors by augmenting it with extra information about other agents, such as observations and actions. Compared with independent learning [24], which only uses local information, MADDPG and COMA can derive better policies in a non-stationary environment.…”

Section: Related Workmentioning

confidence: 99%

“…After training, θ π l and θ Q l are updated as [40]. Our method can be extended to continuous action space by estimating the expectation of b i with Monte Carlo samples or a learnable state value function V(o i , m i ) [23].…”

Section: Implementation In An Actor-critic Frameworkmentioning

confidence: 99%

Scalable and Transferable Reinforcement Learning for Multi-Agent Mixed Cooperative–Competitive Environments Based on Hierarchical Graph Attention

Chen

Song

et al. 2022

Entropy

View full text Add to dashboard Cite

Most previous studies on multi-agent systems aim to coordinate agents to achieve a common goal, but the lack of scalability and transferability prevents them from being applied to large-scale multi-agent tasks. To deal with these limitations, we propose a deep reinforcement learning (DRL) based multi-agent coordination control method for mixed cooperative–competitive environments. To improve scalability and transferability when applying in large-scale multi-agent systems, we construct inter-agent communication and use hierarchical graph attention networks (HGAT) to process the local observations of agents and received messages from neighbors. We also adopt the gated recurrent units (GRU) to address the partial observability issue by recording historical information. The simulation results based on a cooperative task and a competitive task not only show the superiority of our method, but also indicate the scalability and transferability of our method in various scale tasks.

show abstract

“…A key problem while learning from global rewards in multiagent setting is that the gradient computed for an agent i does not explicitly reason about the contribution of that agent to the global team reward. As a result, the gradient becomes noisy given that other agents are also exploring, leading to poor quality solutions (Foerster et al 2017;Bagnell and Ng 2005). Fortunately, creating a separation among local MDPs of agents and joint event-based rewards automatically addresses this problem of noisy gradient in TIDec-MDPs.…”

Section: Multiagent Credit Assignmentmentioning

confidence: 99%

Planning and Learning for Decentralized MDPs With Event Driven Rewards

Gupta

Kumar

Paruchuri

2018

AAAI

View full text Add to dashboard Cite

Decentralized (PO)MDPs provide a rigorous framework for sequential multiagent decision making under uncertainty. However, their high computational complexity limits the practical impact. To address scalability and real-world impact, we focus on settings where a large number of agents primarily interact through complex joint-rewards that depend on their entire histories of states and actions. Such history-based rewards encapsulate the notion of events or tasks such that the team reward is given only when the joint-task is completed. Algorithmically, we contribute---1) A nonlinear programming (NLP) formulation for such event-based planning model; 2) A probabilistic inference based approach that scales much better than NLP solvers for a large number of agents; 3) A policy gradient based multiagent reinforcement learning approach that scales well even for exponential state-spaces. Our inference and RL-based advances enable us to solve a large real-world multiagent coverage problem modeling schedule coordination of agents in a real urban subway network where other approaches fail to scale.

show abstract

Counterfactual Multi-Agent Policy Gradients

Cited by 952 publications

References 34 publications

Resilient Robot Teams: a Review Integrating Decentralised Control, Change-Detection, and Learning

Resilient Robot Teams: a Review Integrating Decentralised Control, Change-Detection, and Learning

Scalable and Transferable Reinforcement Learning for Multi-Agent Mixed Cooperative–Competitive Environments Based on Hierarchical Graph Attention

Planning and Learning for Decentralized MDPs With Event Driven Rewards

Contact Info

Product

Resources

About