2017 IEEE International Conference on Robotics and Automation (ICRA) 2017
DOI: 10.1109/icra.2017.7989165
|View full text |Cite
|
Sign up to set email alerts
|

Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge

Abstract: Robot warehouse automation has attracted significant interest in recent years, perhaps most visibly in the Amazon Picking Challenge (APC) [1]. A fully autonomous warehouse pick-and-place system requires robust vision that reliably recognizes and locates objects amid cluttered environments, self-occlusions, sensor noise, and a large variety of objects. In this paper we present an approach that leverages multiview RGB-D data and self-supervised, data-driven learning to overcome those difficulties. The approach w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
299
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 458 publications
(301 citation statements)
references
References 22 publications
1
299
0
1
Order By: Relevance
“…Conversely, in the third example (Figure 11, third column), the left hand and object are correctly predicted to be separated. We also found that hands and objects can be detected in isolation, which reaffirms similar observations made in previous works [7,28].…”
Section: Detecting Unknown Objectssupporting
confidence: 92%
See 1 more Smart Citation
“…Conversely, in the third example (Figure 11, third column), the left hand and object are correctly predicted to be separated. We also found that hands and objects can be detected in isolation, which reaffirms similar observations made in previous works [7,28].…”
Section: Detecting Unknown Objectssupporting
confidence: 92%
“…These are then used in a complex multi-stage classification scheme for offline action recognition, while our approach uses the FCN outputs to discriminate the hand from the object and to generate pixel labels for real-time tracking. Similar to our approach, [28] used FCNs to discriminate between objects for 6D pose estimation. While this approach produces good results for object localization in cluttered environments, it requires a multicamera setup and does not achieve real-time performance, which is crucial for dynamic hand-object interactions.…”
Section: Related Workmentioning
confidence: 99%
“…Taking an object pose estimation task [10] as an example, the input data includes a set of images and the total size of it could be hundreds of kilobytes at least, while the output data is just the object location and pose, and takes a few dozens of bytes at most.…”
Section: ) Local Computingmentioning
confidence: 99%
“…In addition to these works on 2D segmentation, three-dimentional segmentation is required for robot to conduct tasks in the real world. In order to achieve this, previous works propose projection-based approach projecting segmented pixels to 3D points in a single view (2.5D) [9], mapping-based approach with binary object existence [12] and probabilistic existence [1] for a single target object. And as for fully 3D-based approach, model matching is tackled [13] [14] using various 3D features [15] [16].…”
Section: D Multilabel Mapping For Object Segmentation and Manipumentioning
confidence: 99%