Learning Policies for First Person Shooter Games Using Inverse Reinforcement Learning

This paper proposes a novel pipeline for generating agents that simulate player behaviour. By clustering player traces and using evolutionary algorithms to evolve parametric agents to best represent those clusters, our pipeline creates persona agents that represent the behavioural space of players. We here propose clustering playtraces based on behaviour, emotional experience and a mixture of both. We implement the pipeline on a test bed game and using 182 collected player traces with both behavioural and emotional information, we demonstrate that our persona agents can generate diverse player-like behaviour both in the level used to evolve them but also in a previously unseen level. We further find that using emotional information leads to better behavioural coverage on both levels. Although on its early stages, our approach offers a new perspective on how game developers and testers can gather insights on player behaviour without having to rely on extensive user testing.

Section: Player Clusteringmentioning

confidence: 99%

Persona Agents from Playtraces and Emotion

Fernandes,

Lopes,

Prada

2023

“…In general, the problem of constructing a human model is challenging, particularly for domains where human strategies are unpredictable. One possible option would be to learn a model from observed human data, either online (Barrett, Stone, and Kraus 2011) or offline (Tastan and Sukthankar 2012;Orkin 2008;Broz, Nourbakhsh, and Simmons 2011). Alternatively a human could be modeled as a noisy optimal solver for an MDP or POMDP formulation of the game, assuming such policies could could be tractably found or approximated.…”

Section: Human Modelsmentioning

confidence: 99%

POMCoP: Belief Space Planning for Sidekicks in Cooperative Games

Macindoe

Kaelbling

Lozano-Pérez

2021

We present POMCoP, a system for online planning in collaborative domains that reasons about how its actions will affect its understanding of human intentions, and demonstrate its use in building sidekicks for cooperative games. POMCoP plans in belief space. It explicitly represents its uncertainty about the intentions of its human ally, and plans actions which reveal those intentions or hedge against its uncertainty. This allows POMCoP to reason about the usefulness of incorporating information gathering actions into its plans, such as asking questions, or simply waiting to let humans reveal their intentions. We demonstrate POMCoP by constructing a sidekick for a cooperative pursuit game, and evaluate its effectiveness relative to MDP-based techniques that plan in state space, rather than belief space.

“…If we have a set of observational data from humans playing the game, we can use machine learning to infer a policy and use it as a human model. This modeling approach has been demonstrated by researchers for a variety of machine learning techniques including decision trees (Barrett, Stone, and Kraus 2011) and reinforcement learning (Tastan and Sukthankar 2012). We will call this general approach the machine learning model.…”

Section: Human Modelsmentioning

confidence: 99%

Assistant Agents for Sequential Planning Problems

Macindoe

Kaelbling²,

Lozano-Pérez³

2021

The problem of optimal planning under uncertainty in collaborative multi-agent domains is known to be deeply intractable but still demands a solution. This thesis will explore principled approximation methods that yield tractable approaches to planning for AI assistants, which allow them to understand the intentions of humans and help them achieve their goals. AI assistants are ubiquitous in video games, mak- ing them attractive domains for applying these planning techniques. However, games are also challenging domains, typically having very large state spaces and long planning horizons. The approaches in this thesis will leverage recent advances in Monte-Carlo search, approximation of stochastic dynamics by deterministic dynamics, and hierarchical action representation, to handle domains that are too complex for existing state of the art planners. These planning techniques will be demonstrated across a range of video game domains.