We examine how a network of many knowledge layers can be constructed in an on-line manner, such that the learned units represent building blocks of knowledge that serve to compress the overall representation. Our novel STL algorithm demonstrates a method for simultaneously acquiring and organizing a collection of concepts and functions as a many-layered network.
We explore incremental assimilation of new knowledge by sequential learning. Of particular interest is how a network of many knowledge layers can be constructed in an on-line manner, such that the learned units represent building blocks of knowledge that serve to compress the overall representation and facilitate transfer. We motivate the need for many layers of knowledge, and we advocate sequential learning as an avenue for promoting the construction of layered knowledge structures. Finally, our novel STL algorithm demonstrates a method for simultaneously acquiring and organizing a collection of concepts and functions as a network from a stream of unstructured information.
Automatic transfer of learned knowledge from one task or domain to another offers great potential to simplify and expedite the construction and deployment of intelligent systems. In practice however, there are many barriers to achieving this goal. In this article, we present a prototype system for the real-world context of transferring knowledge of American football from video observation to control in a game simulator. We trace an example play from the raw video through execution and adaptation in the simulator, highlighting the system's component algorithms along with issues of complexity, generality, and scale. We then conclude with a discussion of the implications of this work for other applications, along with several possible improvements.
We introduce relational temporal difference learning as an effective approach to solving multi-agent Markov decision problems with large state spaces. Our algorithm uses temporal difference reinforcement to learn a distributed value function represented over a conceptual hierarchy of relational predicates. We present experiments using two domains from the General Game Playing repository, in which we observe that our system achieves higher learning rates than nonrelational methods. We also discuss related work and directions for future research. Background and MotivationMost research in AI views intelligent behavior as search through a problem space to achieve goals. Directing that search is crucial to an agent's success, but crafting search-control heuristics manually is difficult and prone to error. An alternative response is to acquire such heuristic knowledge through learning. One common approach formulates this task as learning control policies from delayed reward, with policies encoded by expected value functions over Markov decision processes (Sutton & Barto, 1998). This general approach to reinforcement learning has been studied in many settings and from many perspectives.Most work in this tradition uses limited representations and downplays the role of background knowledge. As a result, typical systems search a very large state space and thus learn far more slowly than do humans placed in similar situations. Research on temporal abstraction (Dietterich, 2000) and state abstracAppearing in Proceedings of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, 2006. Copyright 2006 by the authors.tion (Asadi & Huber, 2004) aims to increase learning rates, but few efforts have utilized the more powerful relational representations that are standard in other AI subfields. Recent work on relational reinforcement learning (Dzeroski et al., 2001) uses first-order representations to provide effective abstraction, but it does not take advantage of action models, which are an important source of knowledge in many domains.In this paper, we report a new approach to learning from delayed reward in multi-player games. Our framework is similar to relational reinforcement learning in its reliance on first-order representations. However, it employs a variant of temporal differencing, which is more appropriate than Q-learning when an action model is available, as Tesauro (1994) and Baxter et al. (1998) have demonstrated.As in Dzeroski et al.'s work, we use a relational representation to support effective generalization across states, which should produce more rapid learning. However, rather than using relational regression trees to encode expected values, we use a factored representation that associates component values with relational predicates. These are combined into an overall score, much as in traditional state evaluation functions. Our work offers a novel approach to combining ideas from relational reinforcement learning and feature-based temporal difference learning.In the next section, we describe ou...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.