2018
DOI: 10.1109/lra.2018.2852779
|View full text |Cite
|
Sign up to set email alerts
|

More Than a Feeling: Learning to Grasp and Regrasp Using Vision and Touch

Abstract: For humans, the process of grasping an object relies heavily on rich tactile feedback. Most recent robotic grasping work, however, has been based only on visual input, and thus cannot easily benefit from feedback after initiating contact. In this paper, we investigate how a robot can learn to use tactile information to iteratively and efficiently adjust its grasp. To this end, we propose an end-to-end action-conditional model that learns regrasping policies from raw visuo-tactile data. This model -a deep, mult… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
185
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 289 publications
(200 citation statements)
references
References 40 publications
0
185
0
1
Order By: Relevance
“…They reported a success rate of 85.92% with a deep network and 84.5% with an SVM. A similar approach was followed by Calandra et al [40] in their work on using tactile sensing in robotic grasp detection.…”
Section: Multi-modal Datamentioning
confidence: 95%
See 2 more Smart Citations
“…They reported a success rate of 85.92% with a deep network and 84.5% with an SVM. A similar approach was followed by Calandra et al [40] in their work on using tactile sensing in robotic grasp detection.…”
Section: Multi-modal Datamentioning
confidence: 95%
“…This representation described a grasp in a 2D image plane. This representation was improved by Calandra et al to include the 3D depth information by adding the z coordinates to the representation, resulting in a grasp representation, G z = (x, y, z, θ) [40,41]. The G z grasp representation was also used by Murali et al [42] in their approach to detect robotic grasps through the use of tactile feedback and visual sensing.…”
Section: Grasp Representationmentioning
confidence: 99%
See 1 more Smart Citation
“…Even fewer approaches exploit the complementary nature of vision and touch. Some of them extend their previous work on grasp stability estimation [5,13]. Others perform full manipulation tasks based on multiple input modalities [1,20,31] but require a pre-specified manipulation graph [31], demonstrate only on one task [20,31], or require human demonstration and object CAD models [1].…”
Section: A Contact-rich Manipulationmentioning
confidence: 99%
“…[43,61] fuse vision and range sensing and [61] add language labels. While many of these multimodal approaches are trained through a classification objective [5,13,26,70], in this paper we are interested in multimodal representation learning for control. Figure 2: Neural network architecture for multimodal representation learning with self-supervision.…”
Section: Multimodal Learningmentioning
confidence: 99%