No abstract
Despite groundbreaking progress in reinforcement learning for robotics, gameplay, and other complex domains, major challenges remain in applying reinforcement learning to the evolving, open-world problems often found in critical application spaces. Reinforcement learning solutions tend to generalize poorly when exposed to new tasks outside of the data distribution they are trained on, prompting an interest in continual learning algorithms. In tandem with research on continual learning algorithms, there is a need for challenge environments, carefully designed experiments, and metrics to assess research progress. We address the latter need by introducing a framework for continual reinforcement-learning development and assessment using Lifelong Learning Explorer (L2Explorer), a new, Unitybased, first-person 3D exploration environment that can be continuously reconfigured to generate a range of tasks and task variants structured into complex and evolving evaluation curricula. In contrast to procedurally generated worlds with randomized components, we have developed a systematic approach to defining curricula in response to controlled changes with accompanying metrics to assess transfer, performance recovery, and data efficiency. Taken together, the L2Explorer environment and evaluation approach provides a framework for developing future evaluation methodologies in open-world settings and rigorously evaluating approaches to lifelong learning.In recent years, Deep Reinforcement Learning (DRL) approaches have begun to deliver powerful results for a variety of compelling domains, including games such as Chess, Go, and Shogi Silver et al. [2018]; Atari video games Mnih et al. [2013]; more complex strategy video games Berner et al. [2019], Vinyals et al. [2019]; and dexterous robotic manipulation Rajeswaran et al. [2017]. Despite the groundbreaking success in training autonomous agents, resulting policies tend to be very brittle and generalize poorly Chan et al. [2019]. When presented with a new task or a task variant, DRL approaches are susceptible to a performance drop Zhang et al. [2018], Kirk et al. [2021] due to the catastrophic forgetting problem French [1999], McCloskey and Cohen [1989], which may not be overcome by domain randomization strategies alone. As the field moves from environments which are fixed to evolving, open-world scenarios, current DRL approaches will be insufficient.This performance gap has led to an interest in Continual Learning, which seeks to design algorithms to learn over sequences of tasks. In the related, but broader, concept of Lifelong Learning Chen and Liu [2018], an agent learns over a lifetime of experiences (see Fig. 1) in an evolving environment (for purposes of this paper, however, we treat continual learning as synonymous with lifelong learning as our approach is applicable to both concepts). Much recent work has been on supervised classification under distribution shifts Song et al. [2020] and learning a sequence of tasks Parisi et al. [2019], Hsu et al. [2018. Continual RL Khetar...
We consider the problem in regression analysis of identifying subpopulations that exhibit different patterns of response, where each subpopulation requires a different underlying model. Unlike statistical cohorts, these subpopulations are not known a priori; thus, we refer to them as cadres. When the cadres and their associated models are interpretable, modeling leads to insights about the subpopulations and their associations with the regression target. We introduce a discriminative model that simultaneously learns cadre assignment and target-prediction rules. Sparsity-inducing priors are placed on the model parameters, under which independent feature selection is performed for both the cadre assignment and targetprediction processes. We learn models using adaptive step size stochastic gradient descent, and we assess cadre quality with bootstrapped sample analysis. We present simulated results showing that, when the true clustering rule does not depend on the entire set of features, our method significantly outperforms methods that learn subpopulation-discovery and targetprediction rules separately. In a materials-by-design case study, our model provides state-of-the-art prediction of polymer glass transition temperature. Importantly, the method identifies cadres of polymers that respond differently to structural perturbations, thus providing design insight for targeting or avoiding specific transition temperature ranges. It identifies chemically meaningful cadres, each with interpretable models. Further experimental results show that cadre methods have generalization that is competitive with linear and nonlinear regression models and can identify robust subpopulations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.