Dynamical interaction represents a fundamental coevolutionary rule that addresses the intricacies of cooperation in social dilemmas. It provides a normative account for the changes in ties within interaction networks in response to the behaviour of social partners. While considerable efforts have explored the role of partner selection in fostering cooperation, there remains a limited understanding of how agents learn to establish effective interaction patterns and adapt their connections accordingly. To bridge this knowledge gap, we leverage recent advancements in reinforcement learning and propose an adaptive interaction mechanism to investigate self-organization behaviour in the iterated prisoner's dilemma game. Within this framework, artificial agents are trained using a self-regarding Roth-Erev algorithm, utilizing reputation as a dynamic signal to update their willingness to engage with neighbours. Additionally, these agents are endowed with the capability to sever inactive connections. Simulation results demonstrate the effectiveness of utilizing reinforcement learning and local information from reputation to capture the dynamics of interactions. Notably, we discover that the entangled coevolution of strategy and interaction network can facilitate the emergence and maintenance of cooperation, despite the optimal tolerance threshold for ineffective neighbours varying depending on the strength of the social dilemma. Furthermore, the emerging network topology presented in this work accurately captures the assortative mixing pattern observed in previous experiments and realistic evidence. Finally, we validate the simulation results through theoretical analysis and confirm the robustness of the proposed mechanism across populations of varying sizes and initial structures.
I. INTRODUCTIONF OSTERING cooperation among self-interested agents is a challenging task, as natural selection favours free-riding on others' efforts. These myriad scenarios are characterized by so-called social dilemmas, where short-term individual incentives can conflict with long-term group interests, leading to collective irrationality [1]- [3]. Despite this, cooperation is ubiquitous in both natural and artificial systems [4], and it has played a vital role in the evolution of social species, chief among all in human social progress and civilization. Therefore, understanding the necessary conditions for cooperation has been an active topic, given the contradiction involved in such contexts [5]. The evolutionary game theory (EGT) provides a comprehensive theoretical framework to address social dilemmas. Among these, the prisoner's dilemma (PD) is widely recognized as one of the most challenging scenariosThe authors would like to acknowledge the use of the Computational Shared Facility at The University of Manchester.