Modelling the Dynamics of Regret Minimization in Large Agent Populations: a Master Equation Approach

Wang, Zhen; Mu, Chunjiang; Hu, Shuyue; Chu, Chengcai; Li, Xuelong

doi:10.24963/ijcai.2022/76

Cited by 71 publications

(7 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(4) One Monte Carlo step is characterized by repeating procedures (2) and (3) for N times. (5) The steady state of the system is averaged over the last 2000 steps of the overall 20 000 steps. Moreover, the final results have been averaged over 10 independent runs to eliminate the effect of some uncertainties.…”

Section: Modelmentioning

confidence: 99%

“…The efficiency of multi-agent systems heavily relies on the cooperation among agents [1,2]. Over the past few years, the emergence of cooperative behaviors in multi-agent systems has been a prominent research topic [3][4][5]. Evolutionary game theory [6,7] provides a framework to model and simulate the evolution of behaviors in multi-agent system, where each individual in the game is treated as an agent, and the behavior evolution is achieved by interacting and learning with other agents.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Synergistic effects of adaptive reward and reinforcement learning rules on cooperation

et al. 2023

Self Cite

View full text Add to dashboard Cite

Understanding cooperative behavior in multi-agent system is a research hotspot. In the context of pairwise interaction games, several researches have used reinforcement learning rules to successfully explain and predict the behavior of agents. However, multi-agent interactions are more general than two-agent interactions, and the effect of reward mechanism on behavior of agents is also ignored under the reinforcement learning rules. Therefore, this paper established a framework that combines the public goods game with reinforcement learning and adaptive reward. In that, public goods game is adopted to reflect the decision-making behavior of multi-agent interactions, self-regarding Q-learning emphasizes an experience-based strategy update, and adaptive reward focuses on the adaptability. We are mainly concentrating on the synergistic effects of them. It is remarkable that while self-regarding Q-learning fails to prevent the collapse of cooperation in the traditional public goods game, the fraction of cooperation increases significantly when the adaptive reward strategy is included. Meanwhile, the theoretical analysis results match well with the simulation results, which indicate that there is a specific reward cost to maximize the fraction of cooperation. Our findings may provide a new perspective for establishing cooperative reward mechanisms in social dilemmas.

show abstract

Section: Modelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Synergistic effects of adaptive reward and reinforcement learning rules on cooperation

et al. 2023

Self Cite

View full text Add to dashboard Cite

show abstract

“…In competitive environments, two-player zero-sum games are a fruitful area [10][11][12][13][14] . The optimization goal of the Counterfactual Regret Minimization (CFR) algorithm [19,20] matches the Nash Equilibrium and has worstcase guarantees. Reinforcement learning algorithms empower agents to master complex strategies from scratch self-play [21] .…”

Section: Related Workmentioning

confidence: 99%

Game Interactive Learning: A New Paradigm towards Intelligent Decision-Making

Xing,

Wu,

et al. 2023

CAAI Artificial Intelligence Research

View full text Add to dashboard Cite

Decision-making plays an essential role in various real-world systems like automatic driving, traffic dispatching, information system management, and emergency command and control. Recent breakthroughs in computer game scenarios using deep reinforcement learning for intelligent decision-making have paved decision-making intelligence as a burgeoning research direction. In complex practical systems, however, factors like coupled distracting features, long-term interact links, and adversarial environments and opponents, make decision-making in practical applications challenging in modeling, computing, and explaining. This work proposes game interactive learning, a novel paradigm as a new approach towards intelligent decision-making in complex and adversarial environments. This novel paradigm highlights the function and role of a human in the process of intelligent decision-making in complex systems. It formalizes a new learning paradigm for exchanging information and knowledge between humans and the machine system. The proposed paradigm first inherits methods in game theory to model the agents and their preferences in the complex decision-making process. It then optimizes the learning objectives from equilibrium analysis using reformed machine learning algorithms to compute and pursue promising decision results for practice. Human interactions are involved when the learning process needs guidance from additional knowledge and instructions, or the human wants to understand the learning machine better. We perform preliminary experimental verification of the proposed paradigm on two challenging decision-making tasks in tactical-level War-game scenarios. Experimental results demonstrate the effectiveness of the proposed learning paradigm. KEYWORDSdecision-making; game interactive learning; human-computer interaction; game theory; machine learning R eal-world systems like automatic driving [1] , traffic dispatching [2] , information system management [3] , and emergency command and control [4] , inevitably involve the decision-making procedures within all their running process. The quality of the decision-making results, either accomplished by the human or the machine, thus significantly affects these systems' performance. With the fast development of Artificial Intelligence (AI) in the last decades [5,6] , especially the deep learning models and algorithms, we have witnessed significant progress in highperformance intelligent perception models of audio, visual, and text data [7][8][9] . Due to the newly arising deep reinforcement learning algorithms, researchers have also made a substantial advance in developing human-level intelligent decision models in challenging computer games like Go [10] , Pokers [11,12] , and real-time strategy games [13][14][15] . The MuZero model [16] achieves expert-level performance in Go, Chess, Shogi, and Atrai 2600 games simultaneously. These advances have paved decision-making intelligence as a growing research direction for further artificial intelligence developments in more complex sys...

show abstract

“…Consequently, the system is ultimately dominated by defectors. To overcome this dilemma, many effective mechanisms are proposed to promote cooperation as the strategy of an individual is either cooperation or defection (hereafter called discrete strategy), for example, network structures [9,10], memory [11,12], aspiration [13,14], age structure [15,16], reputation [17,18], asymmetry [19][20][21], reward and punishment [22][23][24] and regret minimization [25].…”

Section: Introductionmentioning

confidence: 99%

Extending Q -learning to continuous and mixed strategy games based on spatial reciprocity

et al. 2023

Self Cite

View full text Add to dashboard Cite

The discrete strategy game, in which agents can only choose cooperation or defection, has received lots of attention. However, this hypothesis seems implausible in the real world, where choices may be continuous or mixed. Furthermore, when applying Q -learning to continuous or mixed strategy games, one of the challenges is that the learning space grows drastically as the number of states and actions rises. So, in this article, we redesign the Q -learning method by considering the spatial reciprocity, in which agents simply interact with their four neighbours to get the reward and learn the action by taking neighbours’ strategy into account. As a result, the learning state and action space is transformed into a 5 × 5 table that stores the state and action of the focal agent and its four neighbours, avoiding the curse of dimensionality caused by a continuous or mixed strategy game. The numerical simulation results reveal the striking differences between the three classes of games. In detail, the discrete strategy game is more vulnerable to the setting of relevant parameters, whereas the other two strategy games are relatively stable. At the same time, in terms of promoting cooperation, a mixed strategy game is always better than a continuous one.

show abstract

Modelling the Dynamics of Regret Minimization in Large Agent Populations: a Master Equation Approach

Cited by 71 publications

References 1 publication

Synergistic effects of adaptive reward and reinforcement learning rules on cooperation

Synergistic effects of adaptive reward and reinforcement learning rules on cooperation

Game Interactive Learning: A New Paradigm towards Intelligent Decision-Making

Extending Q -learning to continuous and mixed strategy games based on spatial reciprocity

Contact Info

Product

Resources

About