Effective communication can improve coordination in cooperative multi-agent reinforcement learning (MARL). One popular communication scheme is exchanging agents' local observations or latent embeddings and using them to augment individual local policy input. Such a communication paradigm can reduce uncertainty for local decision-making and induce implicit coordination. However, it enlarges agents' local policy spaces and increases learning complexity, leading to poor coordination in complex settings. To handle this limitation, this paper proposes a novel framework named Multi-Agent Incentive Communication (MAIC) that allows each agent to learn to generate incentive messages and bias other agents' value functions directly, resulting in effective explicit coordination. Our method firstly learns targeted teammate models, with which each agent can anticipate the teammate's action selection and generate tailored messages to specific agents. We further introduce a novel regularization to leverage interaction sparsity and improve communication efficiency. MAIC is agnostic to specific MARL algorithms and can be flexibly integrated with different value function factorization methods. Empirical results demonstrate that our method significantly outperforms baselines and achieves excellent performance on multiple cooperative MARL tasks.
Value-based multi-agent reinforcement learning (MARL) methods hold the promise of promoting coordination in cooperative settings. Popular MARL methods mainly focus on the scalability or the representational capacity of value functions. Such a learning paradigm can reduce agents' uncertainties and promote coordination. However, they fail to leverage the task structure decomposability, which generally exists in real-world multi-agent systems (MASs), leading to a significant amount of time exploring the optimal policy in complex scenarios. To address this limitation, we propose a novel framework Multi-Agent Concentrative Coordination (MACC) based on task decomposition, with which an agent can implicitly form local groups to reduce the learning space to facilitate coordination. In MACC, agents first learn representations for subtasks from their local information and then implement an attention mechanism to concentrate on the most relevant ones. Thus, agents can pay targeted attention to specific subtasks and improve coordination. Extensive experiments on various complex multi-agent benchmarks demonstrate that MACC achieves remarkable performance compared to existing methods.
Multi-agent pathfinding (MAPF) is essential to large-scale robotic coordination tasks. Planning-based algorithms show their advantages in collision avoidance while avoiding exponential growth in the number of agents. Reinforcement-learning (RL)-based algorithms can be deployed efficiently but cannot prevent collisions entirely due to the lack of hard constraints. This paper combines the merits of planning-based and RL-based MAPF methods to propose a deployment-efficient and collision-free MAPF algorithm. The experiments show the effectiveness of our approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.