Luke Marris scite author profile

Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments (30, 40, 45,46,56) and two-player turn-based games (47,58,66). However, the realworld contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. In this work, we demonstrate for the first time that an agent can achieve human-level in a popular 3D multiplayer first-person video game, Quake III Arena Capture the Flag (28), using only pixels and game points as input. These results were achieved by a novel two-tier optimisation process in which a population of independent RL agents are trained concurrently from thousands of parallel matches with agents playing in teams together and against each other on randomly generated environments. Each agent in the population learns its own internal reward signal to complement the sparse delayed reward from winning, and selects actions using a novel temporally hierarchical representation that enables the agent to reason at multiple timescales. During game-play, these agents display humanlike behaviours such as navigating, following, and defending based on a rich learned representation that is shown to encode high-level game knowledge. In an extensive tournament-style evaluation the trained agents exceeded the winrate of strong human players both as teammates and opponents, and proved far stronger than existing state-of-the-art agents. These results demonstrate a 1 arXiv:1807.01281v1 [cs.LG] 3 Jul 2018 significant jump in the capabilities of artificial agents, bringing us closer to the goal of human-level intelligence.We demonstrate how intelligent behaviour can emerge from training sophisticated new learning agents within complex multi-agent environments. End-to-end reinforcement learning methods (45, 46) have so far not succeeded in training agents in multi-agent games that combine team and competitive play due to the high complexity of the learning problem (7, 43) that arises from the concurrent adaptation of other learning agents in the environment. We approach this challenge by studying team-based multiplayer 3D first-person video games, a genre which is particularly immersive for humans (16) and has even been shown to improve a wide range of cognitive abilities (21). We focus specifically on a modified version (5) of Quake III Arena (28), the canonical multiplayer 3D first-person video game, whose game mechanics served as the basis for many subsequent games, and which has a thriving professional scene (1). The task we consider is the game mode Capture the Flag (CTF) on per game randomly generated maps of both indoor and outdoor theme ( Figure 1 (a,b)). Two opposing teams consisting of multiple individual players compete to capture each other's flags by strategically navigating, tagging, and evading opponents. The team with the greatest number of flag captures after five minutes wins. CTF is play...

show abstract

Backpropagation and the brain

Lillicrap

et al. 2020

View full text Add to dashboard Cite

During learning the brain modifies synapses to improve behaviour. In the cortex synapses are embedded within multi-layered networks, making it difficult to determine the effect of an individual synaptic modification on the behaviour of the system. The backpropagation algorithm solves this problem in deep artificial neural networks, but has historically been viewed as biologically problematic. Nonetheless, recent developments in neuroscience and the successes of artificial neural networks have reinvigorated interest in whether backpropagation offers insights for understanding learning in the cortex. The backpropagation algorithm learns quickly by computing synaptic updates using feedback connections to deliver error signals. While feedback connections are ubiquitous in the cortex, it is difficult to see how they could deliver the error signals required by strict formulations of backpropagation. Here we build on past and recent developments to argue that feedback connections may instead induce neural activities whose differences can be used to locally approximate these signals, and hence drive effective learning in deep networks in the brain.

show abstract

From motor control to team play in simulated humanoid football

Liu¹,

Lever²,

Wang³

et al. 2022

Sci. Robot.

View full text Add to dashboard Cite

Learning to combine control at the level of joint torques with longer-term goal-directed behavior is a long-standing challenge for physically embodied artificial agents. Intelligent behavior in the physical world unfolds across multiple spatial and temporal scales: Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals that are defined on much longer time scales and that often involve complex interactions with the environment and other agents. Recent research has demonstrated the potential of learning-based approaches applied to the respective problems of complex movement, long-term planning, and multiagent coordination. However, their integration traditionally required the design and optimization of independent subsystems and remains challenging. In this work, we tackled the integration of motor control and long-horizon decision-making in the context of simulated humanoid football, which requires agile motor control and multiagent coordination. We optimized teams of agents to play simulated football via reinforcement learning, constraining the solution space to that of plausible movements learned using human motion capture data. They were trained to maximize several environment rewards and to imitate pretrained football-specific skills if doing so led to improved performance. The result is a team of coordinated humanoid football players that exhibit complex behavior at different scales, quantified by a range of analysis and statistics, including those used in real-world sport analytics. Our work constitutes a complete demonstration of learned integrated decision-making at multiple scales in a multiagent setting.

show abstract

From Motor Control to Team Play in Simulated Humanoid Football

Liu¹,

Lever²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected so as to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents. Recent research in artificial intelligence has shown the promise of learning-based approaches to the respective problems of complex movement, longer-term planning, and multi-agent coordination. However, there is limited research aimed at their integration. We study this problem by training teams of physically simulated humanoid avatars to play football in a realistic virtual environment. We develop a method that combines imitation learning, single-and multi-agent reinforcement learning and population-based training, and makes use of transferable representations of behaviour for decision making at different levels of abstraction. In a sequence of training stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and learn to play as a team, successfully bridging the gap between low-level motor control at a time scale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds. We investigate the emergence of behaviours at different levels of abstraction, as well as the representations that underlie these behaviours using several analysis techniques, including statistics from real-world sports analytics. Our work constitutes a complete demonstration of integrated decision-making at multiple scales in a physically embodied multi-agent setting. We provide footage of the learned football skills in the supplementary video. 1

show abstract

Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers

Marris¹,

Müller²,

Lanctot³

et al. 2021

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Luke Marris

Human-level performance in 3D multiplayer games with population-based reinforcement learning

Backpropagation and the brain

From motor control to team play in simulated humanoid football

From Motor Control to Team Play in Simulated Humanoid Football

Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers

Contact Info

Product

Resources

About