2020
DOI: 10.48550/arxiv.2001.05411
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Lipschitz Lifelong Reinforcement Learning

Abstract: We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate. We illustrate the bene… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 6 publications
0
1
0
Order By: Relevance
“…To avoid arbitrary changes, previous works typically require the transition function P k and the reward function R k to be Lipschitz smooth over time [39,30,40,16]. And in fact, we can provide a bound on the change in performance given such Lipschitz conditions, as we show below in Theorem 1.…”
Section: Problem Statement: Letmentioning
confidence: 96%
“…To avoid arbitrary changes, previous works typically require the transition function P k and the reward function R k to be Lipschitz smooth over time [39,30,40,16]. And in fact, we can provide a bound on the change in performance given such Lipschitz conditions, as we show below in Theorem 1.…”
Section: Problem Statement: Letmentioning
confidence: 96%