One-Shot Learning of Multi-Step Tasks from Observation via Activity Localization in Auxiliary Video

Goo, Wonjoon; Niekum, Scott

doi:10.1109/icra.2019.8793515

Cited by 25 publications

(18 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Simulation has also been leveraged as supervision to learn such representations [32] or to produce human data with domain randomization [3]. Finally, meta-learning [54] and subtask discovery [41,20] have also been explored as techniques for acquiring robot rewards or demos from human videos. In contrast to the majority of these works, which usually study a small set of human videos in a similar domain as the robot, we explicitly focus on leveraging "in-the-wild" human videos, specifically large and diverse sets of crowd-sourced videos from the real world from an existing dataset, which contains many different individuals, viewpoints, backgrounds, objects, and tasks.…”

Section: B Robotic Learning From Human Videosmentioning

confidence: 99%

Learning Generalizable Robotic Reward Functions from “In-The-Wild” Human Videos

Chen¹,

Nair²,

Finn³

2021

Robotics: Science and Systems XVII

View full text Add to dashboard Cite

We are motivated by the goal of generalist robots that can complete a wide range of tasks across many environments. Critical to this is the robot's ability to acquire some metric of task success or reward, which is necessary for reinforcement learning, planning, or knowing when to ask for help. For a general-purpose robot operating in the real world, this reward function must also be able to generalize broadly across environments, tasks, and objects, while depending only on on-board sensor observations (e.g. RGB images). While deep learning on large and diverse datasets has shown promise as a path towards such generalization in computer vision and natural language, collecting high quality datasets of robotic interaction at scale remains an open challenge. In contrast, "in-the-wild" videos of humans (e.g. YouTube) contain an extensive collection of people doing interesting tasks across a diverse range of settings. In this work, we propose a simple approach, Domain-agnostic Video Discriminator (DVD), that learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task, and can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos. We find that by leveraging diverse human datasets, this reward function (a) can generalize zero shot to unseen environments, (b) generalize zero shot to unseen tasks, and (c) can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.

show abstract

Section: B Robotic Learning From Human Videosmentioning

confidence: 99%

Learning Generalizable Robotic Reward Functions from “In-The-Wild” Human Videos

Chen¹,

Nair²,

Finn³

2021

Robotics: Science and Systems XVII

View full text Add to dashboard Cite

show abstract

“…While, third-person imitation learning uses date from other agents or viewpoints [27,35]. Recent methods for one-shot imitation learning [8,11,13,40,41,42] can translate a single demonstration to an executable pol- icy. The most similar to ours is NTP [41] that also learns long-horizon tasks.…”

Section: Related Workmentioning

confidence: 99%

Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration

Huang

Nair

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

104

100

View full text Add to dashboard Cite

Our goal is to generate a policy to complete an unseen task given just a single video demonstration of the task in a given domain. We hypothesize that to successfully generalize to unseen complex tasks from a single video demonstration, it is necessary to explicitly incorporate the compositional structure of the tasks into the model. To this end, we propose Neural Task Graph (NTG) Networks, which use conjugate task graph as the intermediate representation to modularize both the video demonstration and the derived policy. We empirically show NTG achieves inter-task generalization on two complex tasks: Block Stacking in Bul-letPhysics and Object Collection in AI2-THOR. NTG improves data efficiency with visual input as well as achieve strong generalization without the need for dense hierarchical supervision. We further show that similar performance trends hold when applied to real-world data. We show that NTG can effectively predict task structure on the JIGSAWS surgical dataset and generalize to unseen tasks.

show abstract

“…Prior work in LfD has tackled the challenging problem of extracting task plans from a single end-user demonstration [10], [11], [12], [13], [14], [15], [16], [17], [18]. These approaches present intuitive ways for end-users to program complex robot behaviors using kinesthetic teaching [10], virtual reality [11], GUI programming [12], or direct demonstration [18].…”

Section: Related Workmentioning

confidence: 99%

Towards Robust One-shot Task Execution using Knowledge Graph Embeddings

Daruna¹,

Nair²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Requiring multiple demonstrations of a task plan presents a burden to end-users of robots. However, robustly executing tasks plans from a single end-user demonstration is an ongoing challenge in robotics. We address the problem of one-shot task execution, in which a robot must generalize a single demonstration or prototypical example of a task plan to a new execution environment. Our approach integrates task plans with domain knowledge to infer task plan constituents for new execution environments. Our experimental evaluations show that our knowledge representation makes more relevant generalizations that result in significantly higher success rates over tested baselines. We validated the approach on a physical platform, which resulted in the successful generalization of initial task plans to 38 of 50 execution environments with errors resulting from autonomous robot operation included.

show abstract

One-Shot Learning of Multi-Step Tasks from Observation via Activity Localization in Auxiliary Video

Cited by 25 publications

References 20 publications

Learning Generalizable Robotic Reward Functions from “In-The-Wild” Human Videos

Learning Generalizable Robotic Reward Functions from “In-The-Wild” Human Videos

Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration

Towards Robust One-shot Task Execution using Knowledge Graph Embeddings

Contact Info

Product

Resources

About