This paper presents a novel class of algorithms, called Heuristically-Accelerated Multiagent Reinforcement Learning (HAMRL), which allows the use of heuristics to speed up well-known multiagent reinforcement learning (RL) algorithms such as the Minimax-Q. Such HAMRL algorithms are characterized by a heuristic function, which suggests the selection of particular actions over others. This function represents an initial action selection policy, which can be handcrafted, extracted from previous experience in distinct domains, or learnt from observation. To validate the proposal, a thorough theoretical analysis proving the convergence of four algorithms from the HAMRL class (HAMMQ, HAMQ(λ), HAMQS, and HAMS) is presented. In addition, a comprehensive systematical evaluation was conducted in two distinct adversarial domains. The results show that even the most straightforward heuristics can produce virtually optimal action selection policies in much fewer episodes, significantly improving the performance of the HAMRL over vanilla RL algorithms.
This paper investigates how to make improved action selection for online policy learning in robotic scenarios using reinforcement learning (RL) algorithms. Since finding control policies using any RL algorithm can be very time consuming, we propose to combine RL algorithms with heuristic functions for selecting promising actions during the learning process. With this aim, we investigate the use of heuristics for increasing the rate of convergence of RL algorithms and contribute with a new learning algorithm, Heuristically Accelerated Q-learning (HAQL), which incorporates heuristics for action selection to the Q-Learning algorithm. Experimental results on robot navigation show that the use of even very simple heuristic functions results in significant performance enhancement of the learning rate.
Abstract. This work presents a new algorithm, called Heuristically Accelerated Q-Learning (HAQL), that allows the use of heuristics to speed up the well-known Reinforcement Learning algorithm Q-learning. A heuristic function H that influences the choice of the actions characterizes the HAQL algorithm. The heuristic function is strongly associated with the policy: it indicates that an action must be taken instead of another. This work also proposes an automatic method for the extraction of the heuristic function H from the learning process, called Heuristic from Exploration. Finally, experimental results shows that even a very simple heuristic results in a significant enhancement of performance of the reinforcement learning algorithm.
The goal of this paper is to propose and analyse a transfer learning meta-algorithm that allows the implementation of distinct methods using heuristics to accelerate a Reinforcement Learning procedure in one domain (the target) that are obtained from another (simpler) domain (the source domain). This meta-algorithm works in three stages: first, it uses a Reinforcement Learning step to learn a task on the source domain, storing the knowledge thus obtained in a case base; second, it does an unsupervised mapping of the source-domain actions to the target-domain actions; and, third, the case base obtained in the first stage is used as heuristics to speed up the learning process in the target domain.A set of empirical evaluations were conducted in two target domains: the 3D mountain car (using a learned case base from a 2D simulation) and stability learning for a humanoid robot in the Robocup 3D Soccer Simulator (that uses knowledge learned from the Acrobot domain). The results attest that our transfer learning algorithm outperforms recent heuristically-accelerated reinforcement learning and transfer learning algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.