2020
DOI: 10.1007/s11071-019-05398-4
|View full text |Cite
|
Sign up to set email alerts
|

Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning

Abstract: Large-scale cooperation underpins the evolution of ecosystems and the human society, and the collective behaviors by self-organization of multi-agent systems are the key for understanding. As artificial intelligence (AI) prevails in almost all branches of science, it would be of great interest to see what new insights of collective behavior could be obtained from a multi-agent AI system. Here, we introduce a typical reinforcement learning (RL) algorithm -Q learning into evolutionary game dynamics, where agents… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 37 publications
(12 citation statements)
references
References 65 publications
0
12
0
Order By: Relevance
“…This is the mainstream paradigm of modeling updating. However, a new paradigm proposed recently [54,55] is through the inward learning, where the decision-making of individuals is through introspective actions based on their history. With the help of machine learning [56], they also model the evolution of cooperation, but is limited to the single game case.…”
Section: Summary and Discussionmentioning
confidence: 99%
“…This is the mainstream paradigm of modeling updating. However, a new paradigm proposed recently [54,55] is through the inward learning, where the decision-making of individuals is through introspective actions based on their history. With the help of machine learning [56], they also model the evolution of cooperation, but is limited to the single game case.…”
Section: Summary and Discussionmentioning
confidence: 99%
“…Q-values are defined by Q-table to record the relative utility of different actions in different states. As per Zhang and co-workers [41,42], the state set bold-italicS and action set bold-italicA are supposed to be the same, i.e. bold-italicS=bold-italicA=falsefalse{C,Dfalsefalse}.…”
Section: Methodsmentioning
confidence: 99%
“…Then, the agent with the selected action interacts with their neighbours and calculates payoffs. After that, the Q-value is updated by the following equation[41,43]: Qs,afalse(t+1false)=false(1ηfalse)Qs,afalse(tfalse)+ηfalse[Πfalse(tfalse)+γQs,amax(t)false],where ηfalse(0,1false] and γfalse[0,1false) denote the learning rate and discount factor (foresight level of agents), respectively. Πfalse(tfalse) is the calculated payoff.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, Yang et al [32] observed a tide-like burst in a probabilisitic migration model of prisoner's dilemma, where migration is driven by conformity and self-centered inequity aversion norms. It's also worth noting that recently Zhang et al [33] introduced the reinforcement learning framework, where periodic burst-like oscillation is also seen in pairwise games.…”
Section: Introductionmentioning
confidence: 99%