2019
DOI: 10.1103/physreve.99.043305
|View full text |Cite
|
Sign up to set email alerts
|

Deterministic limit of temporal difference reinforcement learning for stochastic games

Abstract: Reinforcement learning in multiagent systems has been studied in the fields of economic game theory, artificial intelligence and statistical physics by developing an analytical understanding of the learning dynamics (often in relation to the replicator dynamics of evolutionary game theory). However, the majority of these analytical studies focuses on repeated normal form games, which only have a single environmental state. Environmental dynamics, i.e., changes in the state of an environment affecting the agent… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
43
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 45 publications
(43 citation statements)
references
References 52 publications
0
43
0
Order By: Relevance
“…See, e.g., Börgers and Sarin (1997) for the reinforcement learning model of Cross (1973); Hopkins (2002) and Beggs (2005) for that of Erev and Roth (1998);and Bloembergen et al (2015) for memoryless Q-learning. The application of stochastic approximation techniques to AI agents with memory is more subtle and is currently at the frontier of research, both in computer science and in statistical physics (Barfuss, Donges, and Kurths 2019). To the best of our knowledge, there are no results yet available for ε -greedy Q-learning.…”
Section: A Economic Environmentmentioning
confidence: 99%
“…See, e.g., Börgers and Sarin (1997) for the reinforcement learning model of Cross (1973); Hopkins (2002) and Beggs (2005) for that of Erev and Roth (1998);and Bloembergen et al (2015) for memoryless Q-learning. The application of stochastic approximation techniques to AI agents with memory is more subtle and is currently at the frontier of research, both in computer science and in statistical physics (Barfuss, Donges, and Kurths 2019). To the best of our knowledge, there are no results yet available for ε -greedy Q-learning.…”
Section: A Economic Environmentmentioning
confidence: 99%
“…Since, when the environment changes, the previous decision-making scheme adopted by individuals may fail to work, they must learn how to adjust their behaviours in response to the contingencies given by the environment, in order to obtain a higher fitness. Such a scenario is also closely related to some recent work across disciplines, including complexity science [49,[67][68][69][70], artificial intelligence [44,56,71], evolutionary biology [72,73] and neuroscience [43]. However, their dominant attention has been paid to learning dynamics, the deterministic limit of the learning process, the design of new learning algorithms in games, or neural computations.…”
Section: Discussionmentioning
confidence: 96%
“…In particular, our analysis for the game system is systematic and encompasses a variety of factors, such as group interactions, spatial structures and environmental variations. In addition, our work may offer some new insight into the interface between reinforcement learning and evolutionary game theory from the perspective of function approximation [44,50], because most existing progress in combining tools from these two fields to explore the interaction of multiple agents is based on value-based methods [49,56,70,71].…”
Section: Discussionmentioning
confidence: 99%
“…In our work, we opt for studying the dynamics of the CRD using a form of reinforcement learning in the PBL model to update players' behaviors, as it allows for the exploration of mixed strategies, and is widely accepted as a technique for learning behaviors observed within behavioral economic experiments. Reinforcement learning has been for instance applied to different variations of 2-player games [31,34,[41][42][43][44][45][46] as well as bargaining games [47][48][49], coordination games [50,51] in well-mixed and structured populations, stochastic games [52,53] and other social dilemmas [54,55]. It provides a flexible and powerful framework for studying the dynamics and effects of different variables in the CRD, allowing for a large behavioral (strategic) space and mixed strategies.…”
Section: Related Workmentioning
confidence: 99%