2022
DOI: 10.48550/arxiv.2205.08316
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Self-Supervised Learning of Multi-Object Keypoints for Robotic Manipulation

Abstract: In recent years, policy learning methods using either reinforcement or imitation have made significant progress. However, both techniques still suffer from being computationally expensive and requiring large amounts of training data. This problem is especially prevalent in real-world robotic manipulation tasks, where access to ground truth scene features is not available and policies are instead learned from raw camera observations. In this paper, we demonstrate the efficacy of learning image keypoints via the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…For the purpose of downstream policy learning in robotic manipulation tasks, Von Hartz et al [ 107 ] describe a technique for learning visual keypoints via dense correspondence. The method uses raw camera observations to learn picture keypoints, making policy learning more effective while addressing the issue of computationally expensive and data-intensive policy learning.…”
Section: Deep Rl For Robotic Manipulationmentioning
confidence: 99%
“…For the purpose of downstream policy learning in robotic manipulation tasks, Von Hartz et al [ 107 ] describe a technique for learning visual keypoints via dense correspondence. The method uses raw camera observations to learn picture keypoints, making policy learning more effective while addressing the issue of computationally expensive and data-intensive policy learning.…”
Section: Deep Rl For Robotic Manipulationmentioning
confidence: 99%
“…Video Autoencoder [18] uses an autoencoder to learn the 3D structure of a static scene for the task of novel view synthesis. In the context of robotics, self-supervised representation learning has been used for tasks such as depth estimation [5,8], surface normal estimation [7], optical flow [21,22], visual-inertial odometry [9], keypoints estimation [41], stereo matching [42], image enhancement [26], and scene flow [13] among many others. These approaches have shown tremendous potential in the real world due to their ability to efficiently scale across multiple locations without needing expensive human intervention.…”
Section: Related Workmentioning
confidence: 99%