Multiagent cooperation and competition with deep reinforcement learning

Tampuu, Ardi; Matiisen, Tambet; Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raúl

doi:10.1371/journal.pone.0172395

Cited by 636 publications

(354 citation statements)

References 21 publications

Supporting

Mentioning

329

Contrasting

Unclassified

Order By: Relevance

“…Visual Multi-Agent Reinforcement Learning: Multiagent systems result in non-stationary environments posing significant challenges. Multiple approaches have been proposed over the years to address such concerns [82,83,81,30]. Similarly, a variety of settings from multiple cooperative agents to multiple competitive ones have been investigated [51,65,57,11,63,35,56,29,61].…”

Section: Related Workmentioning

confidence: 99%

Two Body Problem: Collaborative Visual Task Completion

Jain

Weihs

Kolve

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Figure 1: Two agents learn to successfully navigate through a previously unseen environment to find, and jointly lift, a heavy TV. Without learned communication, agents attempt many failed actions and pickups. With learned communication, agents send a message when they observe or when they intend to interact with the TV. The agents also learn to grab the opposite ends of the TV and coordinate to do so. AbstractCollaboration is a necessary skill to perform tasks that are beyond one agent's capabilities. Addressed extensively in both conventional and modern AI, multi-agent collaboration has often been studied in the context of simple grid worlds. We argue that there are inherently visual aspects to collaboration which should be studied in visually rich environments. A key element in collaboration is communication that can be either explicit, through messages, or implicit, through perception of the other agents and the visual world. Learning to collaborate in a visual environment entails learning (1) to perform the task, (2) when and what to communicate, and (3) how to act based on these communications and the perception of the visual world. In this paper we study the problem of learning to collaborate directly from pixels in AI2-THOR and demonstrate the benefits of explicit and implicit modes of communication to perform visual tasks. Refer to our project page for more details: https://prior.allenai.org/projects/ two-body-problem * indicates equal contributions.

show abstract

Section: Related Workmentioning

confidence: 99%

Two Body Problem: Collaborative Visual Task Completion

Jain

Weihs

Kolve

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…Each agent estimates its own optimal Q-function, Q * (s, a) = argmax π Q π (s, a), which satisfies the Bellman optimality equation Q * (s, a) = E[r + γmax a Q * (s , a )|s, a]. Under the assumption of full observability at each agent and fully decentralized control, Tampuu et al combined IQL with deep Q-network (DQN), and proposed that each agent trains its Q-function parameterized by a neural network θ i by minimizing the loss func-tion (Tampuu et al 2017)…”

Section: Independent Q-learningmentioning

confidence: 99%

Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning

Kim

Cho

Sung

2019

AAAI

View full text Add to dashboard Cite

In this paper, we propose a new learning technique named message-dropout to improve the performance for multi-agent deep reinforcement learning under two application scenarios: 1) classical multi-agent reinforcement learning with direct message communication among agents and 2) centralized training with decentralized execution. In the first application scenario of multi-agent systems in which direct message communication among agents is allowed, the messagedropout technique drops out the received messages from other agents in a block-wise manner with a certain probability in the training phase and compensates for this effect by multiplying the weights of the dropped-out block units with a correction probability. The applied message-dropout technique effectively handles the increased input dimension in multi-agent reinforcement learning with communication and makes learning robust against communication errors in the execution phase. In the second application scenario of centralized training with decentralized execution, we particularly consider the application of the proposed messagedropout to Multi-Agent Deep Deterministic Policy Gradient (MADDPG), which uses a centralized critic to train a decentralized actor for each agent. We evaluate the proposed message-dropout technique for several games, and numerical results show that the proposed message-dropout technique with proper dropout rate improves the reinforcement learning performance significantly in terms of the training speed and the steady-state performance in the execution phase.

show abstract

“…Brafman and Tennenholtz introduces a model-based reinforcement learning algorithm R-Max to deal with stochastic games [5]. Such stochastic elements can notably increase the complexity in multi-agent systems and multi-agent tasks, where agents learn to cooperate and compete simultaneously [6] [10]. As other agents adapt and actively adjust their policies, the best policy for each agent would evolve dynamically, giving rise to non-stationarity [8] [9].…”

Section: Introductionmentioning

confidence: 99%

Leveraging Reinforcement Learning Techniques for Effective Policy Adoption and Validation

Kuang

Leung

2019

Computational Science and Its Applications – ICCSA 2019

View full text Add to dashboard Cite

Rewards and punishments in different forms are pervasive and present in a wide variety of decision-making scenarios. By observing the outcome of a sufficient number of repeated trials, one would gradually learn the value and usefulness of a particular policy or strategy. However, in a given environment, the outcomes resulting from different trials are subject to chance influence and variations. In learning about the usefulness of a given policy, significant costs are involved in systematically undertaking the sequential trials; therefore, in most learning episodes, one would wish to keep the cost within bounds by adopting learning stopping rules. In this paper, we examine the deployment of different stopping strategies in given learning environments which vary from highly stringent for mission critical operations to highly tolerant for non-mission critical operations, and emphasis is placed on the former with particular application to aviation safety. In policy evaluation, two sequential phases of learning are identified, and we describe the outcomes variations using a probabilistic model, with closedform expressions obtained for the key measures of performance. Decision rules that map the trial observations to policy choices are also formulated. In addition, simulation experiments are performed, which corroborate the validity of the theoretical results.

show abstract

Multiagent cooperation and competition with deep reinforcement learning

Cited by 636 publications

References 21 publications

Two Body Problem: Collaborative Visual Task Completion

Two Body Problem: Collaborative Visual Task Completion

Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning

Leveraging Reinforcement Learning Techniques for Effective Policy Adoption and Validation

Contact Info

Product

Resources

About