2020
DOI: 10.48550/arxiv.2008.09167
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Imitation Learning with Sinkhorn Distances

Abstract: Imitation learning algorithms have been interpreted as variants of divergence minimization problems. The ability to compare occupancy measures between experts and learners is crucial in their effectiveness in learning from demonstrations. In this paper, we present tractable solutions by formulating imitation learning as minimization of the Sinkhorn distance between occupancy measures. The formulation combines the valuable properties of optimal transport metrics in comparing non-overlapping distributions with a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 30 publications
0
3
0
Order By: Relevance
“…Heess et al (2017);Peng et al (2018); ; Aytar et al (2018) scale imitation learning to complex human-like locomotion and game behavior in non-trivial settings. Our work is an extension of Dadashi et al (2020); Papagiannis & Li (2020) from the Wasserstein to the Gromov-Wasserstein setting. This takes us beyond limitation that the expert and imitator are in the same domain and into the cross-domain setting between agents that live in different spaces.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Heess et al (2017);Peng et al (2018); ; Aytar et al (2018) scale imitation learning to complex human-like locomotion and game behavior in non-trivial settings. Our work is an extension of Dadashi et al (2020); Papagiannis & Li (2020) from the Wasserstein to the Gromov-Wasserstein setting. This takes us beyond limitation that the expert and imitator are in the same domain and into the cross-domain setting between agents that live in different spaces.…”
Section: Related Workmentioning
confidence: 99%
“…Remark 2. The construction of our reward proxy is defined for any occupancy measure and extends to previous work optimizing optimal transport quantities via RL that assumes uniform occupancy measure in the form of a trajectory to bypass the need for derivatives through the transition dynamics (Dadashi et al, 2020;Papagiannis & Li, 2020).…”
Section: Proof Suppose That πmentioning
confidence: 99%
See 1 more Smart Citation