Learning against learning : evolutionary dynamics of reinforcement learning algorithms in strategic interactions

Kaisers, Michael

doi:10.26481/dis.20121217mk

Cited by 10 publications

(5 citation statements)

References 80 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is concluded from ( 44) and ( 45) that, when Similarly, for Agent u,v,2 , based on [44], [46], we can derive…”

Section: Theoremmentioning

confidence: 96%

Task-Oriented Satellite-UAV Networks With Mobile-Edge Computing

Wei,

Feng,

Chen

et al. 2024

IEEE Open J. Commun. Soc.

View full text Add to dashboard Cite

Networked robots have become crucial for unmanned applications since they can collaborate to complete complex tasks in remote/hazardous/depopulated areas. Due to the cost inefficiency of deploying cellular network infrastructure in these areas, hybrid satellite-UAV networks emerge as a promising solution. These networks provide seamless and on-demand connectivity for multiple robots with various task requirements, and support computation-intensive and latency-sensitive services through mobile edge computing (MEC)-based offloading. However, to complete tasks in limited times, the rapid collective movement of mobile robots may cause frequent service migration, and a large number of gathered robots may compete for limited bandwidth resources in satellite and UAV communications. As a result, offloading latency may increase significantly. To address this issue, the average completion time of multi-robot offloading in task-oriented satellite-UAV networks with MEC is formulated as an optimization problem. Unlike conventional mobility-aware MEC-based offloading schemes, joint optimization of mobility control, data offloading, and resource allocation is proposed using velocity control of multiple robots. According to Lyapunov optimization, the original optimization problem is simplified into minimizing the average completion time of offloading for all robots within UAV and satellite coverage. A multi-agent Q-learning algorithm, including multi-group dual-agent Q-learning, is proposed based on local state observation and global reward calculation. In each dual-agent Q-learning, one agent is responsible for velocity control and communication resource allocation, while the other is responsible for data offloading and computational resource allocation. The convergence of the proposed multi-agent Q-learning algorithm is also theoretically analyzed. Simulation results show that the proposed scheme can effectively reduce the offloading latency by up to 35% in the multi-robot environment over its conventional counterparts.

show abstract

“…It is concluded from ( 44) and ( 45) that, when Similarly, for Agent u,v,2 , based on [44], [46], we can derive…”

Section: Theoremmentioning

confidence: 96%

Task-Oriented Satellite-UAV Networks With Mobile-Edge Computing

Wei,

Feng,

Chen

et al. 2024

IEEE Open J. Commun. Soc.

View full text Add to dashboard Cite

show abstract

“…However, the factors affecting the converged bidding price are implicitly indicated therein. In [5], the numerical connection between EGT with RDEs and some baseline MARL algorithms is proved, implying that EGT with RDEs can explicitly reveal the factors affecting the converged result in MARL. Thus, in this letter, the correlation between WoLF-PHC and EGT is investigated and adopted to analyse the learning dynamics.…”

Section: Introductionmentioning

confidence: 93%

“…For EPs, the state refers to [𝜆 , , , 𝑃 , , ], the action is [ 𝑃 , , 𝜆 , ]. The EGT with RDEs, instead, presents the change of probability of multiple "players" selecting different "strategies", and these players will imitate the strategy of those who obtain the largest "payoff" [5]. Empirically, the strategy can be considered as the principle of selecting actions.…”

Section: A Connections Between Marl and Egtmentioning

confidence: 99%

“…The numerical connection between the EGT and MARL lies in the speed of change of strategies and actions. If the speed of "strategy change" can be proved to be proportional to that of "policy change", then the expression of policy change can be directly replaced by that of strategy change [5], in which the factors affecting the converged result are explicitly indicated.…”

Section: A Connections Between Marl and Egtmentioning

confidence: 99%

“…For EPs taking the price-maker strategy, they tend to submit a higher bidding price, to pursue more benefit while taking the risk of failure of bidding. Based on the derivations presented in [5], the change of probability of player 𝑥 and 𝑦 [10] selecting different strategies 𝑝 and 𝑞 can be written as:…”

Section: A Connections Between Marl and Egtmentioning

confidence: 99%

See 2 more Smart Citations

Analysis of Evolutionary Dynamics for Bidding Strategy Driven by Multi-Agent Reinforcement Learning

Zhu

Chan

et al. 2021

IEEE Trans. Power Syst.

View full text Add to dashboard Cite

In this letter, the evolutionary game theory (EGT) with replication dynamic equations (RDEs) is adopted to explicitly determine the factors affecting energy providers' (EPs) willingness of using the market power to uplift the price in the bidding procedure, which could be simulated using the win-or-learn-fast policy hill climbing (WoLF-PHC) algorithm as a multi-agent reinforcement learning (MARL) method. Firstly, empirical and numerical connections between WoLF-PHC and RDEs is proved. Then, by formulating RDEs of the bidding procedure, three factors affecting the bidding strategy preference are revealed, including the load demand, severity of congestion, and the price cap. Finally, the impact of these factors on the converged bidding price is demonstrated in case studies, by simulating the bidding procedure driven by WoLF-PHC.

show abstract

Evolutionary Game Theory as a Catalyst in Smart Grids: From Theoretical Insights to Practical Strategies

Karaki,

Al-Fagih

2024

IEEE Access

View full text Add to dashboard Cite

Learning against learning : evolutionary dynamics of reinforcement learning algorithms in strategic interactions

Cited by 10 publications

References 80 publications

Task-Oriented Satellite-UAV Networks With Mobile-Edge Computing

Task-Oriented Satellite-UAV Networks With Mobile-Edge Computing

Analysis of Evolutionary Dynamics for Bidding Strategy Driven by Multi-Agent Reinforcement Learning

Evolutionary Game Theory as a Catalyst in Smart Grids: From Theoretical Insights to Practical Strategies

Contact Info

Product

Resources

About