We contribute Policy Reuse as a technique to improve a reinforcement learning agent with guidance from past learned similar policies. Our method relies on using the past policies as a probabilistic bias where the learning agent faces three choices: the exploitation of the ongoing learned policy, the exploration of random unexplored actions, and the exploitation of past policies. We introduce the algorithm and its major components: an exploration strategy to include the new reuse bias, and a similarity function to estimate the similarity of past policies with respect to a new one. We provide empirical results demonstrating that Policy Reuse improves the learning performance over different strategies that learn without reuse. Interestingly and almost as a side effect, Policy Reuse also identifies classes of similar policies revealing a basis of core policies of the domain. We demonstrate that such a basis can be built incrementally, contributing the learning of the structure of a domain.
In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in damage to the learning system (or any other system). Consequently, when an agent begins an interaction with a dangerous and high-dimensional state-action space, an important question arises; namely, that of how to avoid (or at least minimize) damage caused by the exploration of the state-action space. We introduce the PI-SRL algorithm which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. We evaluate the proposed method in four complex tasks: automatic car parking, pole-balancing, helicopter hovering, and business management
This paper describes the development of and the first experiments in a Spanish to sign language translation system in a real domain. The developed system focuses on the sentences spoken by an official when assisting people applying for, or renewing their Identity Card. The system translates official explanations into Spanish Sign Language (LSE: Lengua de Signos Españ ola) for Deaf people. The translation system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the hand movements). Two proposals for natural language translation have been evaluated: a rule-based translation module (that computes sign confidence measures from the word confidence measures obtained in the speech recognition module) and a statistical translation module (in this case, parallel corpora were used for training the statistical model). The best configuration reported 31.6% SER (Sign Error Rate) and 0.5780 BLEU (BiLingual Evaluation Understudy). The paper also describes the eSIGN 3D avatar animation module (considering the sign confidence), and the limitations found when implementing a strategy for reducing the delay between the spoken utterance and the sign sequence animation.
Recent discoveries in automated planning are broadening the scope of planners, from toy problems to real applications. However, applying automated planners to real-world problems is far from simple. On the one hand, the definition of accurate action models for planning is still a bottleneck. On the other hand, off-the-shelf planners fail to scale up and to provide good solutions in many domains. In these problematic domains, planners can exploit domain-specific control knowledge to improve their performance in terms of both speed and quality of the solutions. However, manual definition of control knowledge is quite difficult. This paper reviews recent techniques in machine learning for the automatic definition of planning knowledge. It has been organized according to the target of the learning process: automatic definition of planning action models and automatic definition of planning control knowledge. In addition, the paper reviews the advances in the related field of reinforcement learning.
In pattern classification problems, many works have been carried out with the aim of designing good classifiers from different perspectives. These works achieve very good results in many domains. However, in general they are very dependent on some crucial parameters involved in the design. These parameters have to be found by a trial and error process or by some automatic methods, like heuristic search and genetic algorithms, that strongly decrease the performance of the method. For instance, in nearest prototype approaches, main parameters are the number of prototypes to use, the initial set, and a smoothing parameter. In this work, an evolutionary approach based on Nearest Prototype Classifier (ENPC) is introduced where no parameters are involved, thus overcoming all the problems that classical methods have in tuning and searching for the appropiate values. The algorithm is based on the evolution of a set of prototypes that can execute several operators in order to increase their quality in a local sense, and with a high classification accuracy emerging for the whole classifier. This new approach has been tested using four different classical domains, including such artificial distributions as spiral and uniform distibuted data sets, the Iris Data Set and an application domain about diabetes. In all the cases, the experiments show successfull results, not only in the classification accuracy, but also in the number and distribution of the prototypes achieved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.