2015
DOI: 10.1109/tcyb.2014.2319733
|View full text |Cite
|
Sign up to set email alerts
|

Stochastic Abstract Policies: Generalizing Knowledge to Improve Reinforcement Learning

Abstract: Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stoc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(15 citation statements)
references
References 19 publications
0
15
0
Order By: Relevance
“…Furthermore, options learned in a particular setup can be transferred to a new task or domain if they are defined in a problem independent space [29]. Koga et al have proposed an approach to learn abstract policies in order to increase the generalization ability [30,31]. Riano and McGinnity proposed to represent sequences of actions with a finite state automaton and to explore its structure with evolutionary algorithms [32].…”
Section: B Adapting Representationsmentioning
confidence: 99%
“…Furthermore, options learned in a particular setup can be transferred to a new task or domain if they are defined in a problem independent space [29]. Koga et al have proposed an approach to learn abstract policies in order to increase the generalization ability [30,31]. Riano and McGinnity proposed to represent sequences of actions with a finite state automaton and to explore its structure with evolutionary algorithms [32].…”
Section: B Adapting Representationsmentioning
confidence: 99%
“…The online transfer framework proposed in [11] uses advice as transferred knowledge, but the difference between tasks was not emphasized. The work [12] treated the source task and the target task as one single problem, so no consideration on the inter-task mapping was given. To avoid the need of inter-task mapping, Laflamm [13] chose the source task and the target task from the same domain with the same state and action space, and then compared three existing transfer learning methods on the Mario AI domain.…”
Section: Literature Reviewmentioning
confidence: 99%
“…While they select a whole policy that they follow for a certain amount of time during training, we estimate the usefulness of policies and mix the policies during training instead of following a single policy per episode. Koga, Freire and Costa (2015) propose to blend multiple policies into a single abstract policy, which is used at the beginning of learning in any new task (whether the new task is similar to the source tasks or not). In spite of following a similar idea, DE-CAF stores multiple concrete policies, and selects only the most promising ones by taking similarity with the target task into consideration.…”
Section: Discussion Of Related Workmentioning
confidence: 99%
“…• Another important aspect for making gained insights more valuable is generating better abstractions of this knowledge to make it more applicable to a wider range of problems without having to be sure about every detail, as for example shown in Koga, Freire and Costa (2015). Another approach could be to rethink the way we are looking at state values at the moment by factoring in more information than simply the state representation as has been investigated with Universal Value Function Approximators introduced by Schaul et al (2015).…”
Section: Future Workmentioning
confidence: 99%