2019
DOI: 10.48550/arxiv.1905.12888
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Imitation Learning as $f$-Divergence Minimization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
39
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(40 citation statements)
references
References 0 publications
1
39
0
Order By: Relevance
“…In ValueDICE the use of KL-divergence introduces the logarithms and exponentials of expectations. We can of course use other divergences, as suggested by [14,8], but we show below that alternatively we can use one statistical measure (not a divergence), which yields a more feasible objective.…”
Section: Theoretical Analysis and Softdicementioning
confidence: 99%
See 1 more Smart Citation
“…In ValueDICE the use of KL-divergence introduces the logarithms and exponentials of expectations. We can of course use other divergences, as suggested by [14,8], but we show below that alternatively we can use one statistical measure (not a divergence), which yields a more feasible objective.…”
Section: Theoretical Analysis and Softdicementioning
confidence: 99%
“…Proposition 1 forms the foundation of a distribution matching approach for imitation learning that learns π by minimizing the divergence between state-action distribution of the policy d π (s, a) and the empirical distribution d E (s, a) of state-action pairs in the demonstration [12,14,17,8].…”
Section: Introductionmentioning
confidence: 99%
“…normal distribution), only RKL-RL would acquire one of the local solutions. Inspired by this fact, even imitation learning, which minimizes forward KL divergence, has been converted into the problem for minimizing reverse KL divergence (Ke et al, 2019;Uchibe and Doya, 2020). However, it is not clear whether the obtained local solution has sufficient performance.…”
Section: Qualitative Differences Between Forward/reverse Kl Divergencesmentioning
confidence: 99%
“…According to this suggestion, a new optimization method, so-called FKL-RL, is formulated based on forward KL divergence optimization. Note that, only regarding the policy optimization, it is pointed out that traditional RL uses reverse KL divergence, while imitation learning uses forward KL divergence (Ke et al, 2019;Uchibe and Doya, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…IRL is an active research area, with fundamental methods being extended in multiple directions. Our work is complementary to many such extensions, including works considering multiple interacting agents [27], or agents with multiple sub-goals [24,25,11], skills [32], or options [17], or using adversarial reward learning to scale to high dimensional problems [13], or using specialised divergence to match a single intent in a multi-intent dataset [18]. Unlike these works, we consider the problem of learning multiple rewards from a dataset containing several unlabeled behaviour intents [3].…”
Section: Related Workmentioning
confidence: 99%