Social Agents Playing a Periodical Policy

Nowé, Ann; Parent, Johan; Verbeeck, Katja

doi:10.1007/3-540-44795-4_33

Cited by 12 publications

(8 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In a common interest game, ESRL is able to find one of the Pareto optimal solutions of the game. In a conflicting interest game, we show that ESRL agents learn optimal fair, possibly periodical policies [17,26]. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection.…”

Section: Introductionmentioning

confidence: 98%

See 1 more Smart Citation

Exploring selfish reinforcement learning in repeated games with stochastic rewards

Verbeeck

Nowé

Parent

et al. 2006

Auton Agent Multi-Agent Syst

Self Cite

View full text Add to dashboard Cite

In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication. In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive load-balancing of parallel applications is added.K. Verbeeck (B) Computational Modeling Lab (COMO), Vrije Universiteit Brussel, Brussels, Belgium

show abstract

Section: Introductionmentioning

confidence: 98%

“…We call a solution optimally fair when there is no other solution that is also fair for the agents but gives the agents more reward on average. Periodical policies were first introduced in [17].…”

Section: Introductionmentioning

confidence: 99%

Exploring selfish reinforcement learning in repeated games with stochastic rewards

Verbeeck

Nowé

Parent

et al. 2006

Auton Agent Multi-Agent Syst

Self Cite

View full text Add to dashboard Cite

show abstract

“…We compare our results with a related periodical policy [5], and simulations show that agents using our adaptive strategy are able to achieve more optimal fairness results in the sense of obtaining higher utilitarian social welfare. Besides, the agents using our adaptive strategy can achieve fairness with less payoff cost compared with periodical policy when period length becomes smaller.…”

Section: Conclusion and Further Workmentioning

confidence: 96%

“…Nowé et al [5] [7] propose a periodical policy for achieving fair outcomes among multiple agents in a distributed way. This policy can be divided into two periods: reinforcement learning period and communication period.…”

Section: B Fairness Through Multi-agent Reinforcement Learningmentioning

confidence: 99%

“…In this section, the performance of the agents adopting our adaptive strategy is evaluated by comparing with that when using the periodical policy in [5]. The experimental setting follows the problem specification described in III-A and the payoff matrix for the two agents' conflicting interest game is instantiated as the one in Fig.1(a).…”

Section: Experimental Evaluationmentioning

confidence: 99%

See 1 more Smart Citation

Strategy and Fairness in Repeated Two-agent Interaction

Hao

Leung

2010

2010 22nd IEEE International Conference on Tools With Artificial Intelligence

View full text Add to dashboard Cite

The criterion of fairness has not been given much attention in the research of multi-agent learning problem. We propose an adaptive strategy for agents to achieve fairness in repeated two-agent game with conflicting interests. In our strategy, each agent is equipped with inequity-averse based fairness model, and makes its decision according to its attractiveness for each action. Besides, each agent adjusts its own attitudes in an adaptive way on the basis of previous outcome and the payoff distribution of the agents in the system, and our goal is to reach fairness in the sense of obtaining equal accumulated payoffs for each agent. Simulation results show that agents using our strategy can coordinate well with each other and achieve fairness with less payoff cost than previous work.

show abstract