In this paper, an evolutionary reinforcement learning system with time-varying parameters that can learn appropriate policy in dynamical POMDPs is proposed. The proposed system has time-varying parameters that can be adjusted by using reinforcement learning. Hence, the system can adapt to the time variation of the dynamical environment even if its variation cannot be observed. In addition, the state space of the environment is divided evolutionarily. So, one need not to divide the state space in advance. The efficacy of the proposed system is shown by mobile robot control simulation under the environment belongs to dynamical POMDPs. The environment is the passage that has gates iterate opening and closing.PDMDPs
We propose an adaptive probability density function (PDF) to select an effective action on reinforcement learning (RL). The uniform distribution function and the normal distribution function of an action are often used to select an action. When these functions are used, however, the information of search direction is not considered. The proposed method utilizing the information of it enables RL to reduce the number of trials, which is needed to real environment learning. Furthermore, the proposed method can be applied easily to various methods of RL, for example, actor-critic, stochastic gradient ascent method. The performance of our proposed method is demonstrated by computer simulations.
SUMMARYIn this paper, an evolutionary reinforcement learning system with time-varying parameters that can learn appropriate policy in dynamical POMDPs is proposed. The proposed system has time-varying parameters that can be adjusted by using reinforcement learning. Hence, the system can adapt to the time variation of the dynamical environment even if its variation cannot be observed. In addition, the state space of the environment is divided evolutionarily. Thus, one need not divide the state space in advance. The efficacy of the proposed system is shown by mobile robot control simulation under the environment belonging to dynamical POMDPs. The environment is the passage that has gates iterate opening and closing.
16-1 Tokiwadai 2-chome, ube 755-861 I, Japan ume@nn.csse.yamaguchi-u.ac.jp 0bayashiecsse.yamaguchi-u.a.=.jp k@nn.csse.yamaguchi-u.ac.jp Abstract: We propose an asymmetric probability density function (PDF) to select an effective action on reinforcement learning (RI.). The proposed method utilizing the information of search direction enables RL to reduce the number of trials. Furthermore, the proposed method can be applied easily to various methods of RL, for example, actor-critic, stochastic gradient ascent method. The performance of our proposed method is demonstrated by computer simulations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.