2021
DOI: 10.48550/arxiv.2107.08981
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Hierarchical Few-Shot Imitation with Skill Transition Models

Abstract: A desirable property of autonomous agents is the ability to both solve long-horizon problems and generalize to unseen tasks. Recent advances in data-driven skill learning have shown that extracting behavioral priors from offline data can enable agents to solve challenging long-horizon tasks with reinforcement learning. However, generalization to tasks unseen during behavioral prior training remains an outstanding challenge. To this end, we present Few-shot Imitation with Skill Transition Models (FIST), an algo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 4 publications
0
7
0
Order By: Relevance
“…Furthermore, [17] shows the capability to extract the skills abstraction from demonstration dataset. Skills priors based methods [25][26] [27], similar to options discovering, extract the skill representation from demonstrations dataset Fig. 1: Fast Imitation and Policy Reuse Learning.…”
Section: Related Work a Hierarchical Reinforcement Learning Temporal ...mentioning
confidence: 99%
See 2 more Smart Citations
“…Furthermore, [17] shows the capability to extract the skills abstraction from demonstration dataset. Skills priors based methods [25][26] [27], similar to options discovering, extract the skill representation from demonstrations dataset Fig. 1: Fast Imitation and Policy Reuse Learning.…”
Section: Related Work a Hierarchical Reinforcement Learning Temporal ...mentioning
confidence: 99%
“…[10] discusses general policy improvement and general policy evaluation using successor features [11] [12]. One-shot and Few-shot learning are similar topics related to fast learning that need some examples [28] [27]. [28] gives convincing results upon one-shot imitation learning using context and attention extraction.…”
Section: B One-shot Learning Fast Learning Imitation Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…While our procedure observations can appear to be high-level labels or sub-goals similar to [37,38], we do not make assumptions about the structure of procedure observations, which can simply be scalar variable values computed during program execution. There is also a large body of literature that assumes access to a suboptimal offline dataset in addition to the expert demonstration data collected from the same environment, and conduct representation learning [39,40,41,42,43], hierarchical skill extraction [44,45,46], or dynamics model learning [47,48,49] on the suboptimal offline data followed by imitation learning from an expert. These works commonly assume good coverage in the offline data, which requires large amounts of state-action pairs being collected from running additional policies in the environment.…”
Section: Related Workmentioning
confidence: 99%
“…Few approaches for robotic manipulation have a dedi-cated planning component trainable separately from a controller. One exception are approaches that plan in a skill space [21,35,38,45,50], which PLEX can be modified to do as well. Conceptually, PLEX falls under the paradigm of learning from observations (LfO), but existing LfO approaches don't have multitask zero-shot planning capability [3,36,41,42] or demostrate it only in lowdimensional environments across similar tasks [51].…”
Section: Introductionmentioning
confidence: 99%