Rethinking formal models of partially observable multiagent decision making

Kovařík, Vojtěch; Schmid, Martin; Burch, Neil; Bowling, Michael; Lisý, Viliam

doi:10.1016/j.artint.2021.103645

Cited by 20 publications

(11 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First, the fully observable case represents an important baseline for this new model class, against which partially observable scenarios can later be evaluated. Second, the solution presented here can be directly used when converting imperfect-information games to perfect information, such as the mechanism described in [9,17].…”

Section: Definition 7 (Ns-csgmentioning

confidence: 99%

“…As discussed later, this paper assumes a fully observable game setting; a natural extension would be a semantics for NS-CSGs using partially observable stochastic games (POSGs). A variant of POSGs, called factored-observation stochastic games (FOSGs), was recently proposed [17,33] distinguishing between private and public observations in a similar fashion to our NS-CSG model but for finite-state models with no NNs. Partial observability in FOSGs is dealt with via a mechanism for converting imperfect-information games into continuous-state (public belief state) perfectinformation games [9,17] such that many techniques for perfect-information games can also be applied to them.…”

Section: Introductionmentioning

confidence: 99%

“…A variant of POSGs, called factored-observation stochastic games (FOSGs), was recently proposed [17,33] distinguishing between private and public observations in a similar fashion to our NS-CSG model but for finite-state models with no NNs. Partial observability in FOSGs is dealt with via a mechanism for converting imperfect-information games into continuous-state (public belief state) perfectinformation games [9,17] such that many techniques for perfect-information games can also be applied to them. The fully-observable NS-CSG model proposed here can arguably serve as a vehicle to later solve the more complex case of NS-CSGs with imperfect information.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Strategy Synthesis for Zero-sum Neuro-symbolic Concurrent Stochastic Games (Extended Version)

Yan¹,

Cnop²,

Norman³

et al. 2022

Preprint

View full text Add to dashboard Cite

Neuro-symbolic approaches to artificial intelligence, which combine neural networks with classical symbolic techniques, are growing in prominence, necessitating formal approaches to reason about their correctness. We propose a novel modelling formalism called neuro-symbolic concurrent stochastic games (NS-CSGs), which comprise a set of probabilistic finite-state agents interacting in a shared continuous-state environment, observed through perception mechanisms implemented as neural networks. Since the environment state space is continuous, we focus on the class of NS-CSGs with Borel state spaces and Borel measurability restrictions on the components of the model.We consider the problem of zero-sum discounted cumulative reward, proving that NS-CSGs are determined and therefore have a value which corresponds to a unique fixed point. From an algorithmic perspective, existing methods to compute values and optimal strategies for CSGs focus on finite state spaces. We present, for the first time, value iteration and policy iteration algorithms to solve a class of uncountable state space CSGs, and prove their convergence. Our approach works by formulating piecewise linear or constant representations of the value functions and strategies of NS-CSGs. We validate the approach with a prototype implementation applied to a dynamic vehicle parking example.

show abstract

Section: Definition 7 (Ns-csgmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Strategy Synthesis for Zero-sum Neuro-symbolic Concurrent Stochastic Games (Extended Version)

Yan¹,

Cnop²,

Norman³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Here, we give a concise introduction to necessary concepts, which are based on the Factored-Observation Stochastic Games (FOSG) formalism. For further details on the formalism, see [40,64].…”

Section: Background and Terminologymentioning

confidence: 99%

Player of Games

Schmid¹,

Moravčík²,

Burch³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.

show abstract

“…In practice, rather than using h as input to our approximators, we use a concatenation of all players' observations, i.e. an encoding of the augmented information states or action-observation histories[10,22]. In some games this is sufficient to recover a full history.…”

mentioning

confidence: 99%

The Advantage Regret-Matching Actor-Critic

Gruslys¹,

Lanctot²,

Munos³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL). In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior. We propose a model-free RL algorithm, the Advantage Regret-Matching Actor-Critic (ARMAC): rather than saving past state-action data, ARMAC saves a buffer of past policies, replaying through them to reconstruct hindsight assessments of past behavior. These retrospective value estimates are used to predict conditional advantages which, combined with regret matching, produces a new policy. In particular, ARMAC learns from sampled trajectories in a centralized training setting, without requiring the application of importance sampling commonly used in Monte Carlo counterfactual regret (CFR) minimization; hence, it does not suffer from excessive variance in large environments. In the single-agent setting, ARMAC shows an interesting form of exploration by keeping past policies intact. In the multiagent setting, ARMAC in self-play approaches Nash equilibria on some partially-observable zero-sum benchmarks. We provide exploitability estimates in the significantly larger game of betting-abstracted no-limit Texas Hold'em.Preprint. Under review.

show abstract

Rethinking formal models of partially observable multiagent decision making

Cited by 20 publications

References 8 publications

Strategy Synthesis for Zero-sum Neuro-symbolic Concurrent Stochastic Games (Extended Version)

Strategy Synthesis for Zero-sum Neuro-symbolic Concurrent Stochastic Games (Extended Version)

Player of Games

The Advantage Regret-Matching Actor-Critic

Contact Info

Product

Resources

About