Robotics: Science and Systems XVII 2021
DOI: 10.15607/rss.2021.xvii.012
|View full text |Cite
|
Sign up to set email alerts
|

Learning Generalizable Robotic Reward Functions from “In-The-Wild” Human Videos

Abstract: We are motivated by the goal of generalist robots that can complete a wide range of tasks across many environments. Critical to this is the robot's ability to acquire some metric of task success or reward, which is necessary for reinforcement learning, planning, or knowing when to ask for help. For a general-purpose robot operating in the real world, this reward function must also be able to generalize broadly across environments, tasks, and objects, while depending only on on-board sensor observations (e.g. R… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
21
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 35 publications
(21 citation statements)
references
References 29 publications
0
21
0
Order By: Relevance
“…Even with powerful reinforcement learning methods, task specification on real robots remains challenging, as engineering rewards on physical systems can be costly and time-consuming [37]. Motivated by this, many works have studied the reward specification, including inverse reinforcement learning [38] with robot demonstrations [39,40,41,42,43], learning rewards from user preferences [44,45,46,47], and learning rewards from videos of humans [48,49]. One common approach in visual RL is to learn to reach a goal image using a coarse measure of reward like negative 2 pixel distance [30,50,51] or temporal distance [52,53,34].…”
Section: Related Workmentioning
confidence: 99%
“…Even with powerful reinforcement learning methods, task specification on real robots remains challenging, as engineering rewards on physical systems can be costly and time-consuming [37]. Motivated by this, many works have studied the reward specification, including inverse reinforcement learning [38] with robot demonstrations [39,40,41,42,43], learning rewards from user preferences [44,45,46,47], and learning rewards from videos of humans [48,49]. One common approach in visual RL is to learn to reach a goal image using a coarse measure of reward like negative 2 pixel distance [30,50,51] or temporal distance [52,53,34].…”
Section: Related Workmentioning
confidence: 99%
“…Many prior works have studied how robots can learn to complete a wide range of tasks from vision. While many approaches have been taken to task-specification, including task IDs [54,55], robot and human demonstrations [56,57,58], and meta-learning from rewards [59], a common approach is goalconditioned learning [60,61,2,1], where an agent learns to reach particular goal states or distributions [62]. Many approaches have been applied to this domain, ranging from goal-conditioned modelfree learning [2,63,55,64] with goal relabeling [61], model-based planning with a learned visual dynamics model [65,66], to methods which combine the both [67].…”
Section: Related Workmentioning
confidence: 99%
“…In this work, we aim to learn visuomotor control on real robots from large datasets of sub-optimal or even random offline data. Modelbased RL techniques have been particularly effective in this endeavor [65,58], and in our case all offline data can be used to train a single task-agnostic visual dynamics model. We then use this model with planning to maximize the learned language-conditioned reward R θ .…”
Section: Learning Language Conditioned Policies With Visual Model Pre...mentioning
confidence: 99%
“…This data is large and diverse, spanning scenes across the globe, and tasks ranging from folding clothes to cooking a meal. While the embodiment present in this data differs from most robots, prior work [17,18] has found that such human video data can still be useful for learning reward functions. Furthermore, domain gap has not been a major barrier for using pre-trained representations in traditional vision and NLP tasks.…”
Section: Introductionmentioning
confidence: 99%