A self-supervised learning system for object detection using physics simulation and multi-view pose estimation

Mitash, Chaitanya; Bekris, Kostas E.; Boularias, Abdeslam

doi:10.1109/iros.2017.8202206

Cited by 98 publications

(64 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Other lines of work utilize photo-realistic rendering and realistic scene compositions to overcome the domain gap by synthesizing images that match the real world as close as possible [9,13,25,17,1,8,33,18]. While these methods have shown promising results they face many hard challenges.…”

Section: Related Workmentioning

confidence: 99%

An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Detection

Hinterstoißer¹,

Pauly²,

Heibel³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

View full text Add to dashboard Cite

Deep learning methods typically require vast amounts of training data to reach their full potential. While some publicly available datasets exists, domain specific data always needs to be collected and manually labeled, an expensive, time consuming and error prone process. Training with synthetic data is therefore very lucrative, as dataset creation and labeling comes for free. We propose a novel method for creating purely synthetic training data for object detection. We leverage a large dataset of 3D background models and densely render them using full domain randomization. This yields background images with realistic shapes and texture on top of which we render the objects of interest. During training, the data generation process follows a curriculum strategy guaranteeing that all foreground models are presented to the network equally under all possible poses and conditions with increasing complexity. As a result, we entirely control the underlying statistics and we create optimal training samples at every stage of training. Using a set of 64 retail objects, we demonstrate that our simple approach enables the training of detectors that outperform models trained with real data on a challenging evaluation dataset.

show abstract

Section: Related Workmentioning

confidence: 99%

An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Detection

Hinterstoißer¹,

Pauly²,

Heibel³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

View full text Add to dashboard Cite

show abstract

“…Semi-supervised learning (Blum and Mitchell, 1998;Joachims, 1999) addresses this problem by making use of a large amount of unlabeled data and a small amount of labeled data. Similarly, as an autonomous supervised learning approach, self-supervised learning (Mitash et al, 2017) usually extracts and uses the naturally available relevant context and embedded meta data as supervisory signals. Active learning (Arasu et al, 2010;Bellare et al, 2012) is another special case of supervised learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points.…”

Section: Machine Learning Paradigmsmentioning

confidence: 99%

Gradual Machine Learning for Entity Resolution

Hou

Chen

Shen

et al. 2019

The World Wide Web Conference

View full text Add to dashboard Cite

Usually considered as a classification problem, entity resolution (ER) can be very challenging on real data due to the prevalence of dirty values. The state-of-the-art solutions for ER were built on a variety of learning models (most notably deep neural networks), which require lots of accurately labeled training data. Unfortunately, high-quality labeled data usually require expensive manual work, and are therefore not readily available in many real scenarios. In this paper, we propose a novel learning paradigm for ER, called gradual machine learning, which aims to enable effective machine labeling without the requirement for manual labeling effort. It begins with some easy instances in a task, which can be automatically labeled by the machine with high accuracy, and then gradually labels more challenging instances by iterative factor graph inference. In gradual machine learning, the hard instances in a task are gradually labeled in small stages based on the estimated evidential certainty provided by the labeled easier instances. Our extensive experiments on real data have shown that the performance of the proposed approach is considerably better than its unsupervised alternatives, and highly competitive compared to the state-of-the-art supervised techniques. Using ER as a test case, we demonstrate that gradual machine learning is a promising paradigm potentially applicable to other challenging classification tasks requiring extensive labeling effort.

show abstract

“…Eventhough this is promising as data driven approaches could help close the gap between 3d geometric models and noisy observed data, it needs access to a large model aligned 3d training dataset, which may be difficult to collect. Another technique often used is to perform object segmentation using CNNs trained specifically for the setup [11], [2], [19] and perform point cloud registration methods The image describes the process of hypotheses generation for objects present in the scene. The process starts with extracting object segments S 1:3 using Faster-RCNN [6], followed by using a global point cloud registration technique [4] to compute a set of possible model transformations (T 1:3 ) that corresponds to the respective segments.…”

Section: B Progress In Deep Learningmentioning

confidence: 99%

“…In order to address the issue of potentially conflicting candidate object poses, the scene hypotheses are dynamically constructed by introducing a constrained local optimization step over candidate object poses returned by Super4PCS, a fast global model matching method [4]. To limit detection errors that arise in cluttered scene, the proposed method builds on top of a previous contribution [11], which performs clutter-specific autonomous training to get object segments. This paper provides experimental indications that the set of candidate object poses returned by Super4PCS given the clutter-aware training contains object poses that are close enough to the ground truth, however, these might not be the ones that receive the best matching score according to Super4PCS.…”

Section: Introductionmentioning

confidence: 99%

Improving 6D Pose Estimation of Objects in Clutter Via Physics-Aware Monte Carlo Tree Search

Mitash

Boularias

Bekris

2018

2018 IEEE International Conference on Robotics and Automation (ICRA)

Self Cite

View full text Add to dashboard Cite

This work proposes a process for efficiently searching over combinations of individual object 6D pose hypotheses in cluttered scenes, especially in cases involving occlusions and objects resting on each other. The initial set of candidate object poses is generated from state-of-the-art object detection and global point cloud registration techniques. The best scored pose per object by using these techniques may not be accurate due to overlaps and occlusions. Nevertheless, experimental indications provided in this work show that object poses with lower ranks may be closer to the real poses than ones with high ranks according to registration techniques. This motivates a global optimization process for improving these poses by taking into account scene-level physical interactions between objects. It also implies that the Cartesian product of candidate poses for interacting objects must be searched so as to identify the best scene-level hypothesis. To perform the search efficiently, the candidate poses for each object are clustered so as to reduce their number but still keep a sufficient diversity. Then, searching over the combinations of candidate object poses is performed through a Monte Carlo Tree Search (MCTS) process that uses the similarity between the observed depth image of the scene and a rendering of the scene given the hypothesized pose as a score that guides the search procedure. MCTS handles in a principled way the tradeoff between fine-tuning the most promising poses and exploring new ones, by using the Upper Confidence Bound (UCB) technique. Experimental results indicate that this process is able to quickly identify in cluttered scenes physically-consistent object poses that are significantly closer to ground truth compared to poses found by point cloud registration methods.

show abstract

A self-supervised learning system for object detection using physics simulation and multi-view pose estimation

Cited by 98 publications

References 38 publications

An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Detection

An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Detection

Gradual Machine Learning for Entity Resolution

Improving 6D Pose Estimation of Objects in Clutter Via Physics-Aware Monte Carlo Tree Search

Contact Info

Product

Resources

About