2008
DOI: 10.1007/s10994-008-5061-y
|View full text |Cite
|
Sign up to set email alerts
|

Transfer in variable-reward hierarchical reinforcement learning

Abstract: Transfer learning seeks to leverage previously learned tasks to achieve faster learning in a new task. In this paper, we consider transfer learning in the context of related but distinct Reinforcement Learning (RL) problems. In particular, our RL problems are derived from Semi-Markov Decision Processes (SMDPs) that share the same transition dynamics but have different reward functions that are linear in a set of reward features. We formally define the transfer learning problem in the context of RL as learning … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
52
0

Year Published

2012
2012
2018
2018

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 64 publications
(52 citation statements)
references
References 13 publications
0
52
0
Order By: Relevance
“…Although the authors state that the average optimal Q-function is always the best initialization, our article has shown otherwise. Mehta et al [51] assume fixed task dynamics and reward features, but different reward feature weights in each task. Given the reward weights of a target task, they initialize the task with the value function of the best stored source task policy given the new reward weights.…”
Section: Multi-task Reinforcement Learningmentioning
confidence: 99%
“…Although the authors state that the average optimal Q-function is always the best initialization, our article has shown otherwise. Mehta et al [51] assume fixed task dynamics and reward features, but different reward feature weights in each task. Given the reward weights of a target task, they initialize the task with the value function of the best stored source task policy given the new reward weights.…”
Section: Multi-task Reinforcement Learningmentioning
confidence: 99%
“…Unlike other learning paradigms (see Pan and Yang (2010) for a review of the possible settings in supervised learning), an RL problem is defined by different elements such as the dynamics and the reward, and the tasks in M may differ in a number of possible ways depending on the similarities and differences in each of these elements. For instance, in the transfer problem considered by Mehta et al (2008) all the tasks share the same state-action space and dynamics but the reward functions are obtained as linear combinations of basis reward functions and a weight vector. In this case, the space of tasks M is the set of MDPs which can be generated by varying the weights of the reward functions.…”
Section: The Settingsmentioning
confidence: 99%
“…Example 4. Let us consider a similar scenario to the real-time strategy (RTS) game introduced in (Mehta et al, 2008). In RTS, there is a number of basic tasks such as attacking the enemy, mining gold, building structures, which are useful to accomplish more complex tasks such as preparing an army and conquering an enemy region.…”
Section: Problem Formulationmentioning
confidence: 99%
See 1 more Smart Citation
“…Rather than acquiring skills from scratch for each problem it faces, an agent can acquire "portable skills" that it can deploy when facing new problems. Illustrating this benefit of behavioral modularity is beyond the scope of this chapter, and we refer the reader to Guestrin et al (2003), Konidaris et al (2012a), Konidaris and Barto (2007), Liu and Stone (2006), Mehta et al (2008), Taylor et al (2007), and Torrey et al (2008).…”
Section: Behavioral Hierarchymentioning
confidence: 99%