2022
DOI: 10.48550/arxiv.2202.02433
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching

Abstract: We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile algorithm for offline imitation learning (IL) via state-occupancy matching. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality and an analytic solution in tabular MDPs. Without requiring access to expert actions, SMODICE can be effectively applied to three offline IL settings: (i) imitation from observations (IfO), (ii) IfO with dynamics or morph… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 20 publications
0
1
0
Order By: Relevance
“…Value Implicit Pre-Training (VIP). VIP (Ma et al, 2022b) learns the optimal goal-conditioned value function via the dual goal-conditioned RL formulation (Ma et al, 2022a;:…”
Section: Preliminaries and Problem Settingmentioning
confidence: 99%
“…Value Implicit Pre-Training (VIP). VIP (Ma et al, 2022b) learns the optimal goal-conditioned value function via the dual goal-conditioned RL formulation (Ma et al, 2022a;:…”
Section: Preliminaries and Problem Settingmentioning
confidence: 99%