Abstract. The conventional reinforcement learning approaches have difficulties to handle the policy alternation of the opponents because it may cause dynamic changes of state transition probabilities of which stability is necessary for the learning to converge. This paper presents a method of multi-module reinforcement learning in a multiagent environment, by which the learning agent can adapt itself to the policy changes of the opponents. We show a preliminary result of a simple soccer situation in the context of RoboCup.
Abstract. The existing reinforcement learning approaches have been suffering from the policy alternation of others in multiagent dynamic environments such as RoboCup competitions since other agent behaviors may cause sudden changes of state transition probabilities of which constancy is necessary for the learning to converge. A modular learning approach would be able to solve this problem if a learning agent can assign each module to one situation in which the module can regard the state transition probabilities as constant. This paper presents a method of modular learning in a multiagent environment, by which the learning agent can adapt its behaviors to the situations as results of the other agent's behaviors. Scheduling for learning is introduced to avoid the complexity in autonomous situation assignment.
Abstract-Existing reinforcement learning approaches have been suffering from the policy alternation of others in multiagent dynamic environments. A typical example is the case of RoboCup competitions because other agent behaviors may cause sudden changes in state transition probabilities in which constancy is needed for the learning to converge. The keys for simultaneous learning to acquire competitive behaviors in such an environment are• a modular learning system for adaptation to the policy alternation of others; and • an introduction of macro actions for simultaneous learning to reduce the search space. This paper presents a method of modular learning in a multiagent environment in which the learning agents can simultaneously learn their behaviors and adapt themselves to the situations as a consequence of the others' behaviors.
Abstract-Existing reinforcement learning approaches have been suffering from the policy alternation of others in multiagent dynamic environments. A typical example is the case of RoboCup competitions because other agent behaviors may cause sudden changes in state transition probabilities in which constancy is needed for the learning to converge. The keys for simultaneous learning to acquire competitive behaviors in such an environment are• a modular learning system for adaptation to the policy alternation of others; and • an introduction of macro actions for simultaneous learning to reduce the search space. This paper presents a method of modular learning in a multiagent environment in which the learning agents can simultaneously learn their behaviors and adapt themselves to the situations as a consequence of the others' behaviors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.