EasyLabel: A Semi-Automatic Pixel-wise Object Annotation Tool for Creating Robotic RGB-D Datasets

Suchi, Markus; Patten, Timothy; Fischinger, David; Vincze, Markus

doi:10.1109/icra.2019.8793917

Cited by 86 publications

(80 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is the common approach at large companies working on the challenge of autonomous driving, for example, Google 1 or Uber. 2 A similar approach has been taken for the robotics environment to build up a dataset of indoor scenes in [9]. However, this approach is limited, since it will be difficult to create examples of all possible scenarios that one might ever encounter.…”

Section: Situated Robot Perception: Embodied Ai From the View Ofmentioning

confidence: 99%

“…With the detection of larger structures such as cupboards, desks and tables, and the detection of surfaces, the task of detecting objects is enhanced with the dimension to find clusters of data points that stick out of the plane and possibly present one or more objects [9]. This simplifies object detection or can be viewed as presenting a second step of verification to the detection step.…”

Section: Situated Robot Perception: Embodied Ai From the View Ofmentioning

confidence: 99%

See 1 more Smart Citation

Learn, detect, and grasp objects in real-world settings

Vincze

Patten

Park

et al. 2020

Elektrotech. Inftech.

Self Cite

View full text Add to dashboard Cite

Experts predict that future robot applications will require safe and predictable operation: robots will need to be able to explain what they are doing to be trusted. To reach this goal, they will need to perceive their environment and its object to better understand the world and the tasks they have to perform. This article gives an overview of present advances with the focus on options to learn, detect, and grasp objects. With the approach of colour and depth (RGB-D) cameras and the advances in AI and deep learning methods, robot vision has been pushed considerably over the last years. We summarise recent results for pose estimation of objects and work on verifying object poses using a digital twin and physics simulation. The idea is that any hypothesis from an object detector and pose estimator is verified leveraging on the continuous advances in deep learning approaches to create object hypotheses. We then show that the object poses are robust enough such that a mobile manipulator can approach the object and grasp it. We intend to indicate that it is now feasible to model, recognise and grasp many objects with good performance, though further work is needed for applications in industrial settings.

show abstract

Section: Situated Robot Perception: Embodied Ai From the View Ofmentioning

confidence: 99%

Section: Situated Robot Perception: Embodied Ai From the View Ofmentioning

confidence: 99%

Learn, detect, and grasp objects in real-world settings

Vincze

Patten

Park

et al. 2020

Elektrotech. Inftech.

Self Cite

View full text Add to dashboard Cite

show abstract

“…This method includes an interactive tool to correct prediction errors. EasyLabel [20] is a semi-automatic method for annotating objects on the RGB-D table-top setting. Label-Fusion [21] is another semi-automatic method for generating large quantities of semantic labels from RGB-D videos.…”

Section: A Semantic Segmentationmentioning

confidence: 99%

Automatic Dense Annotation for Monocular 3D Scene Understanding

et al. 2020

View full text Add to dashboard Cite

Deep neural networks have revolutionized many areas of computer vision, but they require notoriously large amounts of labeled training data. For tasks such as semantic segmentation and monocular 3d scene layout estimation, collecting high-quality training data is extremely laborious because dense, pixellevel ground truth is required and must be annotated by hand. In this paper, we present two techniques for significantly reducing the manual annotation effort involved in collecting large training datasets. The tools are designed to allow rapid annotation of entire videos collected by RGBD cameras, thus generating thousands of ground-truth frames to use for training. First, we propose a fully-automatic approach to produce dense pixel-level semantic segmentation maps. The technique uses noisy evidence from pre-trained object detectors and scene layout estimators and incorporates spatial and temporal context in a conditional random field formulation. Second, we propose a semi-automatic technique for dense annotation of 3d geometry, and in particular, the 3d poses of planes in indoor scenes. This technique requires a human to quickly annotate just a handful of keyframes per video, and then uses the camera poses and geometric reasoning to propagate these labels through an entire video sequence. Experimental results indicate that the technique could be used as an alternative or complementary source of training data, allowing large-scale data to be collected with minimal human effort.

show abstract

“…The experimental results show that our method outperforms the existing approaches and establishes new state-of-the-art results for both datasets. In order to further consolidate the effectiveness of our method, we adapt an object segmentation dataset, called Object Clutter Indoor Dataset (OCID) [13], to the instance recognition task to further evaluate RCFusion. OCID has been recently released to provide object scenes with high level of clutter and occlusion, arguably two of the biggest challenges faced by robotic visual perception systems [14].…”

Section: Introductionmentioning

confidence: 99%

“…Accuracy (%) of DECO[7] and variations of RCFusion on Object Clutter Indoor Dataset[13]. "RCFusion -res5" refers to the the variation of RCFusion without extracting features from multiple layers, i.d.…”

mentioning

confidence: 99%

Recurrent Convolutional Fusion for RGB-D Object Recognition

Loghmani

Caputo

Vincze

2019

IEEE Robot. Autom. Lett.

Self Cite

View full text Add to dashboard Cite

Providing robots with the ability to recognize objects like humans has always been one of the primary goals of robot vision. The introduction of RGB-D cameras has paved the way for a significant leap forward in this direction thanks to the rich information provided by these sensors. However, the robot vision community still lacks an effective method to synergically use the RGB and depth data to improve object recognition. In order to take a step in this direction, we introduce a novel end-to-end architecture for RGB-D object recognition called recurrent convolutional fusion (RCFusion). Our method generates compact and highly discriminative multi-modal features by combining RGB and depth information representing different levels of abstraction. Extensive experiments on two popular datasets, RGB-D Object Dataset and JHUIT-50, show that RCFusion significantly outperforms state-of-the-art approaches in both the object categorization and instance recognition tasks. In addition, experiments on the more challenging Object Clutter Indoor Dataset confirm the validity of our method in the presence of clutter and occlusion. The code is publicly available at: "https://github.com/MRLoghmani/rcfusion".

show abstract

EasyLabel: A Semi-Automatic Pixel-wise Object Annotation Tool for Creating Robotic RGB-D Datasets

Cited by 86 publications

References 23 publications

Learn, detect, and grasp objects in real-world settings

Learn, detect, and grasp objects in real-world settings

Automatic Dense Annotation for Monocular 3D Scene Understanding

Recurrent Convolutional Fusion for RGB-D Object Recognition

Contact Info

Product

Resources

About