SegICP: Integrated deep semantic segmentation and pose estimation

Wong, Jay Ming; Kee, Vincent; Le, Tiffany; Wagner, Syler; Mariottini, Gian-Luca; Schneider, Anja; Hamilton, Lei; Chipalkatty, Rahul; Hebert, Mitchell; Johnson, David M.; Wu, Jimmy; Zhou, Bolei; Torralba, Antonio

doi:10.1109/iros.2017.8206470

Cited by 118 publications

(83 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…With the availability of powerful commodity GPUs, and fast detection al-gorithms [27,38], these methods are suitable for realtime object detection required in robotics. More recently, deep learning based approaches in computer vision are being adopted for the task of pose estimation of specific objects [33,53,54]. Improving instance detection and pose estimation in warehouses will be signifcantly useful for the perception pipeline in systems trying to solve the Amazon Picking Challenge [7].…”

Section: Related Workmentioning

confidence: 99%

Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection

Dwibedi

Misra

Hebert

2017

2017 IEEE International Conference on Computer Vision (ICCV)

588

476

View full text Add to dashboard Cite

A major impediment in rapidly deploying object detection models for instance detection is the lack of large annotated datasets. For example, finding a large labeled dataset containing instances in a particular kitchen is unlikely. Each new environment with new instances requires expensive data collection and annotation. In this paper, we propose a simple approach to generate large annotated instance datasets with minimal effort. Our key insight is that ensuring only patch-level realism provides enough training signal for current object detector models. We automatically 'cut' object instances and 'paste' them on random backgrounds. A naive way to do this results in pixel artifacts which result in poor performance for trained models. We show how to make detectors ignore these artifacts during training and generate data that gives competitive performance on real data. Our method outperforms existing synthesis approaches and when combined with real images improves relative performance by more than 21% on benchmark datasets. In a cross-domain setting, our synthetic data combined with just 10% real data outperforms models trained on all real data.

show abstract

Section: Related Workmentioning

confidence: 99%

Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection

Dwibedi

Misra

Hebert

2017

2017 IEEE International Conference on Computer Vision (ICCV)

588

476

View full text Add to dashboard Cite

show abstract

“…The dataset consists of indoor scenes with 10 categories of densely annotated objects relevant to an automotive oil change, such as oil bottles, funnels, and engines. Images were captured with one of three sensor types (Microsoft Kinect1, Microsoft Kinect2, or Asus Xtion Pro Live) and were automatically annotated with object poses and pixelwise instance masks using either the motion capture setup described in [13] or the LabelFusion [15] pipeline.…”

Section: A Datasetsmentioning

confidence: 99%

“…Object poses are expensive to annotate and were often hand annotated in the past [4], [12]. More recently, automatic annotation methods have been proposed using motion capture [13] or 3D scene reconstruction [14], [15], but these methods still require significant human labor and are not able to generate significant variability in pose since objects must remain stationary during data capture. To address this issue, we propose a novel pose estimation approach that leverages synthetic pose data.…”

Section: Introductionmentioning

confidence: 99%

Real-Time Object Pose Estimation with Pose Interpreter Networks

Wu¹,

Zhou²,

Russell

et al. 2018

2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Self Cite

View full text Add to dashboard Cite

In this work, we introduce pose interpreter networks for 6-DoF object pose estimation. In contrast to other CNN-based approaches to pose estimation that require expensively annotated object pose data, our pose interpreter network is trained entirely on synthetic pose data. We use object masks as an intermediate representation to bridge real and synthetic. We show that when combined with a segmentation model trained on RGB images, our synthetically trained pose interpreter network is able to generalize to real data. Our endto-end system for object pose estimation runs in real-time (20 Hz) on live RGB data, without using depth information or ICP refinement.

show abstract

“…The impressive development of deep neural networks, especially convolutional neural networks (CNNs; Garcia‐Garcia, Orts‐Escolano, Oprea, Villena‐Martinez, & Garcia‐Rodriguez, ), has led to a significant improvement in semantic segmentation approaches in recent years. Many robotic applications benefited from these improvements, for example, autonomous driving (Luc, Neverova, Couprie, Verbeek, & LeCun, ) and object detection and manipulation (Wong et al, ). For training, however, these methods require extensive amounts of pixel‐level labeled data.…”

Section: Introductionmentioning

confidence: 99%

“…manipulation (Wong et al, 2017). For training, however, these methods require extensive amounts of pixel-level labeled data.…”

mentioning

confidence: 99%

CoralSeg: Learning coral segmentation from sparse annotations

Alonso

Yuval

Eyal

et al. 2019

Journal of Field Robotics

View full text Add to dashboard Cite

Robotic advances and developments in sensors and acquisition systems facilitate the collection of survey data in remote and challenging scenarios. Semantic segmentation, which attempts to provide per‐pixel semantic labels, is an essential task when processing such data. Recent advances in deep learning approaches have boosted this task's performance. Unfortunately, these methods need large amounts of labeled data, which is usually a challenge in many domains. In many environmental monitoring instances, such as the coral reef example studied here, data labeling demands expert knowledge and is costly. Therefore, many data sets often present scarce and sparse image annotations or remain untouched in image libraries. This study proposes and validates an effective approach for learning semantic segmentation models from sparsely labeled data. Based on augmenting sparse annotations with the proposed adaptive superpixel segmentation propagation, we obtain similar results as if training with dense annotations, significantly reducing the labeling effort. We perform an in‐depth analysis of our labeling augmentation method as well as of different neural network architectures and loss functions for semantic segmentation. We demonstrate the effectiveness of our approach on publicly available data sets of different real domains, with the emphasis on underwater scenarios—specifically, coral reef semantic segmentation. We release new labeled data as well as an encoder trained on half a million coral reef images, which is shown to facilitate the generalization to new coral scenarios.

show abstract

SegICP: Integrated deep semantic segmentation and pose estimation

Cited by 118 publications

References 32 publications

Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection

Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection

Real-Time Object Pose Estimation with Pose Interpreter Networks

CoralSeg: Learning coral segmentation from sparse annotations

Contact Info

Product

Resources

About