In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication. In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive load-balancing of parallel applications is added.K. Verbeeck (B) Computational Modeling Lab (COMO), Vrije Universiteit Brussel, Brussels, Belgium
This paper proposes an incremental approach for building solutions using evolutionary computation. It presents a simple evolutionary model called a Transition model in which partial solutions are constructed that interact to provide larger solutions. An evolutionary process is used to merge these partial solutions into a full solution for the problem at hand. The paper provides a preliminary study on the evolutionary dynamics of this model as well as an empirical comparison with other evolutionary techniques on binary constraint satisfaction.
Coordination is an important issue in multiagent systems. Within the stochastic game framework this problem translates to policy learning in a joint action space. This technique however suffers some important drawbacks like the assumption of the existence of a unique Nash equilibrium and synchronicity, the need for central control, the cost of communication, etc. Moreover in general sum games it is not always clear which policies should be learned. Playing pure Nash equilibria is often unfair to at least one of the players, while playing a mixed strategy doesn't give any guarantee for coordination and usually results in a sub-optimal payoff for all agents. In this work we show the usefulness of periodical policies, which arise as a side effect of the fairness conditions used by the agents. We are interested in games which assume competition between the players, but where the overall performance can only be as good as the performance of the poorest player. Players are social distributed reinforcement learners, who have to learn to equalize their payoff. Our approach is illustrated on synchronous one-step games as well as on asynchronous job scheduling games.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.