Combating False Negatives in Adversarial Imitation Learning

Żołna, Konrad; Saharia, Chitwan; Boussioux, Léonard; Hui, David Yu-Tung; Chevalier-Boisvert, Maxime; Bahdanau, Dzmitry; Bengio, Yoshua

doi:10.48550/arxiv.2002.00412

Cited by 1 publication

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, Fu et al (2017) explored how rewards can generalize to training policies under changing dynamics. However, most prior work focuses on improving policy generalization to unseen task settings by addressing challenges introduced by the adversarial training objective of GAIL (Xu & Denil, 2019;Zolna et al, 2020;Lee et al, 2021;Barde et al, 2020;Jaegle et al, 2021;Dadashi et al, 2020). Finally, in contrast to most related work on generalization, our work focuses on analyzing and improving reward function transfer to new task settings.…”

Section: Background and Related Workmentioning

confidence: 99%

BC-IRL: Learning Generalizable Reward Functions from Demonstrations

Szot¹,

Zhang²,

Batra³

et al. 2023

Preprint

View full text Add to dashboard Cite

How well do reward functions learned with inverse reinforcement learning (IRL) generalize? We illustrate that state-of-the-art IRL algorithms, which maximize a maximum-entropy objective, learn rewards that overfit to the demonstrations. Such rewards struggle to provide meaningful rewards for states not covered by the demonstrations, a major detriment when using the reward to learn policies in new situations. We introduce BC-IRL, a new inverse reinforcement learning method that learns reward functions that generalize better when compared to maximum-entropy IRL approaches. In contrast to the MaxEnt framework, which learns to maximize rewards around demonstrations, BC-IRL updates reward parameters such that the policy trained with the new reward matches the expert demonstrations better. We show that BC-IRL learns rewards that generalize better on an illustrative simple task and two continuous robotic control tasks, achieving over twice the success rate of baselines in challenging generalization settings.

show abstract