2020
DOI: 10.48550/arxiv.2006.13916
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

Abstract: We propose a simple, practical, and intuitive approach for domain adaptation in reinforcement learning. Our approach stems from the idea that the agent's experience in the source domain should look similar to its experience in the target domain. Building off of a probabilistic view of RL, we formally show that we can achieve this goal by compensating for the difference in dynamics by modifying the reward function. This modified reward function is simple to estimate by learning auxiliary classifiers that distin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(26 citation statements)
references
References 61 publications
(76 reference statements)
0
26
0
Order By: Relevance
“…It is likely that some fine-tuning of the policy on real data would greatly increase its robustness in the real environment, and developing a technique which could do so efficiently is one direction for future work. Similarly, domain adaptation techniques could be employed to produce a policy more capable of adapting to the real environment [8,9]. However, ideally the policy could be learned from scratch on the real system; a suitable simulator may not always be available.…”
Section: Discussionmentioning
confidence: 99%
“…It is likely that some fine-tuning of the policy on real data would greatly increase its robustness in the real environment, and developing a technique which could do so efficiently is one direction for future work. Similarly, domain adaptation techniques could be employed to produce a policy more capable of adapting to the real environment [8,9]. However, ideally the policy could be learned from scratch on the real system; a suitable simulator may not always be available.…”
Section: Discussionmentioning
confidence: 99%
“…But the problem studied in Zhang et al (2020a) is a multi-task setting where the agent aims to learn generalizable abstract states from a series of tasks. Another related topic is domain adaptation in RL (Higgins et al, 2017;Eysenbach et al, 2020;Zhang et al, 2020b), where the target observation space (e.g. real world) is different from the source observation (e.g.…”
Section: Related Workmentioning
confidence: 99%
“…changed dimension). Moreover, the aim of domain adaptation is usually zero-shot generalization to new observations, thus prior knowledge or a few samples of the target domain is often needed (Eysenbach et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…By training two classifiers to capture the domain difference between the source domain and the target domain, Domain Adaptation with Rewards from Classifiers (DARC) [26] can train a near-optimal policies for the target domain on source domain. We get inspiration from it, using the same method to capture domain differences, and introducing them into imitation learning to solve our problems.…”
Section: Related Workmentioning
confidence: 99%
“…In practice, direct calculation of DD(s t , a t , s t+1 ) is technically unfeasible. We draw on [26]'s methodology which calculates the domain shift, since the domain shift quantification in [26] bears resemblance with DD(s t , a t , s t+1 ) in form.…”
Section: Off-dynamics Inverse Reinforcement Learningmentioning
confidence: 99%