Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching

Zeng, Andy; Song, Shuran; Yu, Kuan-Ting; Donlon, Elliott; Hogan, Francois R.; Bauzá, Maria; Ma, Dongli; Taylor, Orion; Liu, Melody; Romo, Eudald; Fazeli, Nima; Alet, Ferran; Dafle, Nikhil Chavan; Holladay, Rachel; Morona, Isabella; Nair, Prem Qu; Green, Druck; Taylor, Ian; Liu, Weber; Rodríguez, Alberto

doi:10.48550/arxiv.1710.01330

Cited by 10 publications

(12 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, some designs used two or more grippers in one robotic hand [4]. The grippers can be fixed to tuning turrets [5], or they can have one or more Degree of Freedoms (DoFs) relative to each other [6], [7]. Some other designs used fully actuated [8] or underactutated anthropomorphic hands [9], [10], [11].…”

Section: Introductionmentioning

confidence: 99%

“…Some other designs used fully actuated [8] or underactutated anthropomorphic hands [9], [10], [11]. Specifically, Zeng et al [6] developed a gripper with a retractable mechanism to allow switching between a parallel gripper and a suction gripper. Cannella et al [12] and Chen et al [13] developed industrial grippers with twisting ability for high-speed assembly.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Double Jaw Hand Designed for Multi-Object Assembly

Triyonoputro

Wan

Harada

2018

2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)

View full text Add to dashboard Cite

This paper presents a double jaw hand for industrial assembly. The hand comprises two orthogonal parallel grippers with different mechanisms. The inner gripper is made of a crank-slider mechanism which is compact and able to firmly hold objects like shafts. The outer gripper is made of a parallelogram that has large stroke to hold big objects like pulleys. The two grippers are connected by a prismatic joint along the hand's approaching vector. The hand is able to hold two objects and perform in-hand manipulation like pull-in (insertion) and push-out (ejection). This paper presents the detailed design and implementation of the hand, and demonstrates the advantages by performing experiments on two sets of peg-in-multi-hole assembly tasks as parts of the World Robot Challenge (WRC) 2018 a using a bimanual robot.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Double Jaw Hand Designed for Multi-Object Assembly

Triyonoputro

Wan

Harada

2018

2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)

View full text Add to dashboard Cite

show abstract

“…Multiple, RGBD images across space can also be integrated to produce such explicit representations [31]. The latter approach is often used to obtain a 3D scene representation in grasping tasks ( [24], [25]). In contrast to these methods, neural-based algorithms learn implicit representations of a scene.…”

Section: B Multiple View Object and Scene Representation Learningmentioning

confidence: 99%

Active Perception and Representation for Robotic Manipulation

Zaky¹,

Paruthi²,

Tripp³

et al. 2020

Preprint

View full text Add to dashboard Cite

The vast majority of visual animals actively control their eyes, heads, and/or bodies to direct their gaze toward different parts of their environment [3]. In contrast, recent applications of reinforcement learning in robotic manipulation employ cameras as passive sensors. These are carefully placed to view a scene from a fixed pose. Active perception allows animals to gather the most relevant information about the world and focus their computational resources where needed. It also enables them to view objects from different distances and viewpoints, providing a rich visual experience from which to learn abstract representations of the environment. Inspired by the primate visual-motor system, we present a framework that leverages the benefits of active perception to accomplish manipulation tasks. Our agent uses viewpoint changes to localize objects, to learn state representations in a self-supervised manner, and to perform goal-directed actions. We apply our model to a simulated grasping task with a 6-DoF action space. Compared to its passive, fixed-camera counterpart, the active model achieves 8% better performance in targeted grasping. Compared to vanilla deep Q-learning algorithms [44], our model is at least four times more sample-efficient, highlighting the benefits of both active perception and representation learning.

show abstract

“…However, these methods tend to produce average grasps which are invalid for certain symmetric objects [2]. Recently, methods such as [4], [5], [13]- [15] used auto-encoders to generate grasp poses at every pixel. They demonstrated higher grasp accuracy compared to the global methods.…”

Section: Related Workmentioning

confidence: 99%

“…Other methods such as [3] focused on learning grasps at patch-level by extracting patches (of different sizes) from the image and predicting a grasp for each patch. Recently, methods such as [4], [5] used auto-encoders to learn grasp parameters at each pixel in the image. They showed that one-to-one mapping (of image data to ground truth grasps) at the pixel-level can effectively be learnt using small CNN structures to achieve fast inference speed.…”

Section: Introductionmentioning

confidence: 99%

Densely Supervised Grasp Detector (DSGD)

Asif¹,

Tang²,

Harrer³

2019

AAAI

View full text Add to dashboard Cite

This paper presents Densely Supervised Grasp Detector (DSGD), a deep learning framework which combines CNN structures with layer-wise feature fusion and produces grasps and their confidence scores at different levels of the image hierarchy (i.e., global-, region-, and pixel-levels). Specifically, at the global-level, DSGD uses the entire image information to predict a grasp. At the region-level, DSGD uses a region proposal network to identify salient regions in the image and predicts a grasp for each salient region. At the pixel-level, DSGD uses a fully convolutional network and predicts a grasp and its confidence at every pixel. During inference, DSGD selects the most confident grasp as the output. This selection from hierarchically generated grasp candidates overcomes limitations of the individual models. DSGD outperforms state-of-the-art methods on the Cornell grasp dataset in terms of grasp accuracy. Evaluation on a multi-object dataset and real-world robotic grasping experiments show that DSGD produces highly stable grasps on a set of unseen objects in new environments. It achieves 97% grasp detection accuracy and 90% robotic grasping success rate with real-time inference speed.

show abstract

Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching

Cited by 10 publications

References 30 publications

A Double Jaw Hand Designed for Multi-Object Assembly

A Double Jaw Hand Designed for Multi-Object Assembly

Active Perception and Representation for Robotic Manipulation

Densely Supervised Grasp Detector (DSGD)

Contact Info

Product

Resources

About