Deterministic limit of temporal difference reinforcement learning for stochastic games

Barfuß, Wolfram; Donges, Jonathan F.; Kurths, Jürgen

doi:10.1103/physreve.99.043305

Cited by 45 publications

(43 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…See, e.g., Börgers and Sarin (1997) for the reinforcement learning model of Cross (1973); Hopkins (2002) and Beggs (2005) for that of Erev and Roth (1998);and Bloembergen et al (2015) for memoryless Q-learning. The application of stochastic approximation techniques to AI agents with memory is more subtle and is currently at the frontier of research, both in computer science and in statistical physics (Barfuss, Donges, and Kurths 2019). To the best of our knowledge, there are no results yet available for ε -greedy Q-learning.…”

Section: A Economic Environmentmentioning

confidence: 99%

Artificial Intelligence, Algorithmic Pricing, and Collusion

Calvano

Calzolari

Denicolò

et al. 2020

American Economic Review

281

View full text Add to dashboard Cite

Increasingly, algorithms are supplanting human decision-makers in pricing goods and services. To analyze the possible consequences, we study experimentally the behavior of algorithms powered by Artificial Intelligence (Q-learning) in a workhorse oligopoly model of repeated price competition. We find that the algorithms consistently learn to charge supracompetitive prices, without communicating with one another. The high prices are sustained by collusive strategies with a finite phase of punishment followed by a gradual return to cooperation. This finding is robust to asymmetries in cost or demand, changes in the number of players, and various forms of uncertainty. (JEL D21, D43, D83, L12, L13)

show abstract

Section: A Economic Environmentmentioning

confidence: 99%

Artificial Intelligence, Algorithmic Pricing, and Collusion

Calvano

Calzolari

Denicolò

et al. 2020

American Economic Review

281

View full text Add to dashboard Cite

show abstract

“…Since, when the environment changes, the previous decision-making scheme adopted by individuals may fail to work, they must learn how to adjust their behaviours in response to the contingencies given by the environment, in order to obtain a higher fitness. Such a scenario is also closely related to some recent work across disciplines, including complexity science [49,[67][68][69][70], artificial intelligence [44,56,71], evolutionary biology [72,73] and neuroscience [43]. However, their dominant attention has been paid to learning dynamics, the deterministic limit of the learning process, the design of new learning algorithms in games, or neural computations.…”

Section: Discussionmentioning

confidence: 96%

“…In particular, our analysis for the game system is systematic and encompasses a variety of factors, such as group interactions, spatial structures and environmental variations. In addition, our work may offer some new insight into the interface between reinforcement learning and evolutionary game theory from the perspective of function approximation [44,50], because most existing progress in combining tools from these two fields to explore the interaction of multiple agents is based on value-based methods [49,56,70,71].…”

Section: Discussionmentioning

confidence: 99%

Learning enables adaptation in cooperation for multi-player stochastic games

Huang

Cao

Wang

2020

J. R. Soc. Interface.

View full text Add to dashboard Cite

Interactions among individuals in natural populations often occur in a dynamically changing environment. Understanding the role of environmental variation in population dynamics has long been a central topic in theoretical ecology and population biology. However, the key question of how individuals, in the middle of challenging social dilemmas (e.g. the ‘tragedy of the commons’), modulate their behaviours to adapt to the fluctuation of the environment has not yet been addressed satisfactorily. Using evolutionary game theory, we develop a framework of stochastic games that incorporates the adaptive mechanism of reinforcement learning to investigate whether cooperative behaviours can evolve in the ever-changing group interaction environment. When the action choices of players are just slightly influenced by past reinforcements, we construct an analytical condition to determine whether cooperation can be favoured over defection. Intuitively, this condition reveals why and how the environment can mediate cooperative dilemmas. Under our model architecture, we also compare this learning mechanism with two non-learning decision rules, and we find that learning significantly improves the propensity for cooperation in weak social dilemmas, and, in sharp contrast, hinders cooperation in strong social dilemmas. Our results suggest that in complex social–ecological dilemmas, learning enables the adaptation of individuals to varying environments.

show abstract

“…In our work, we opt for studying the dynamics of the CRD using a form of reinforcement learning in the PBL model to update players' behaviors, as it allows for the exploration of mixed strategies, and is widely accepted as a technique for learning behaviors observed within behavioral economic experiments. Reinforcement learning has been for instance applied to different variations of 2-player games [31,34,[41][42][43][44][45][46] as well as bargaining games [47][48][49], coordination games [50,51] in well-mixed and structured populations, stochastic games [52,53] and other social dilemmas [54,55]. It provides a flexible and powerful framework for studying the dynamics and effects of different variables in the CRD, allowing for a large behavioral (strategic) space and mixed strategies.…”

Section: Related Workmentioning

confidence: 99%