Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

Eysenbach, Benjamin; Asawa, Swapnil; Chaudhari, Shreyas; Levine, Sergey; Salakhutdinov, Ruslan

doi:10.48550/arxiv.2006.13916

Cited by 15 publications

(26 citation statements)

References 61 publications

(76 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is likely that some fine-tuning of the policy on real data would greatly increase its robustness in the real environment, and developing a technique which could do so efficiently is one direction for future work. Similarly, domain adaptation techniques could be employed to produce a policy more capable of adapting to the real environment [8,9]. However, ideally the policy could be learned from scratch on the real system; a suitable simulator may not always be available.…”

Section: Discussionmentioning

confidence: 99%

Solving the Real Robot Challenge using Deep Reinforcement Learning

McCarthy¹,

Sanchez²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

Solving the Real Robot Challenge using Deep Reinforcement Learning

McCarthy¹,

Sanchez²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

“…But the problem studied in Zhang et al (2020a) is a multi-task setting where the agent aims to learn generalizable abstract states from a series of tasks. Another related topic is domain adaptation in RL (Higgins et al, 2017;Eysenbach et al, 2020;Zhang et al, 2020b), where the target observation space (e.g. real world) is different from the source observation (e.g.…”

Section: Related Workmentioning

confidence: 99%

“…changed dimension). Moreover, the aim of domain adaptation is usually zero-shot generalization to new observations, thus prior knowledge or a few samples of the target domain is often needed (Eysenbach et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

Transfer RL across Observation Feature Spaces via Model-Based Regularization

Sun¹,

Ruijie²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations, and may thus be subject to dramatic changes over time (e.g. increased number of observable features). However, when the observation space changes, the previous policy will likely fail due to the mismatch of input features, and another policy must be trained from scratch, which is inefficient in terms of computation and sample complexity. Following theoretical insights, we propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task to use as a model-based regularizer. Our algorithm works for drastic changes of observation space (e.g. from vector-based observation to image-based observation), without any inter-task mapping or any prior knowledge of the target task. Empirical results show that our algorithm significantly improves the efficiency and stability of learning in the target task. * The work was done while the author was an intern at Unity Technologies.

show abstract

“…By training two classifiers to capture the domain difference between the source domain and the target domain, Domain Adaptation with Rewards from Classifiers (DARC) [26] can train a near-optimal policies for the target domain on source domain. We get inspiration from it, using the same method to capture domain differences, and introducing them into imitation learning to solve our problems.…”

Section: Related Workmentioning

confidence: 99%

“…In practice, direct calculation of DD(s t , a t , s t+1 ) is technically unfeasible. We draw on [26]'s methodology which calculates the domain shift, since the domain shift quantification in [26] bears resemblance with DD(s t , a t , s t+1 ) in form.…”

Section: Off-dynamics Inverse Reinforcement Learningmentioning

confidence: 99%

Off-Dynamics Inverse Reinforcement Learning from Hetero-Domain

Kang¹,

Liu²,

Cao³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose an approach for inverse reinforcement learning from hetero-domain which learns a reward function in the simulator, drawing on the demonstrations from the real world. The intuition behind the method is that the reward function should not only be oriented to imitate the experts, but should encourage actions adjusted for the dynamics difference between the simulator and the real world. To achieve this, the widely used GAN-inspired IRL method is adopted, and its discriminator, recognizing policy-generating trajectories, is modified with the quantification of dynamics difference. The training process of the discriminator can yield the transferable reward function suitable for simulator dynamics, which can be guaranteed by derivation. Effectively, our method assigns higher rewards for demonstration trajectories which do not exploit discrepancies between the two domains. With extensive experiments on continuous control tasks, our method shows its effectiveness and demonstrates its scalability to highdimensional tasks.

show abstract

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

Cited by 15 publications

References 61 publications

Solving the Real Robot Challenge using Deep Reinforcement Learning

Solving the Real Robot Challenge using Deep Reinforcement Learning

Transfer RL across Observation Feature Spaces via Model-Based Regularization

Off-Dynamics Inverse Reinforcement Learning from Hetero-Domain

Contact Info

Product

Resources

About