Fast 6D pose estimation for texture-less objects from a single RGB image

Muñoz, Enrique; Konishi, Yukiko; Murino, Vittorio; Bue, Alessio Del

doi:10.1109/icra.2016.7487781

Cited by 37 publications

(17 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Crivellaro et al [12] supply 3D CAD models and annotated RGB sequences with 3 highly occluded and texture-less objects. Muñoz et al [36] provide RGB sequences of 6 texture-less objects that are each imaged in isolation against a clean background and without occlusion. Further to the above, there exist RGB datasets such as [13,50,38,25], for which the ground truth is provided only in the form of 2D bounding boxes.…”

Section: Depth-only and Rgb-only Datasetsmentioning

confidence: 99%

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects

Hodaň

Haluza

Obdržálek

et al. 2017

2017 IEEE Winter Conference on Applications of Computer Vision (WACV)

413

329

View full text Add to dashboard Cite

We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from simple scenes with several isolated objects to very challenging ones with multiple instances of several objects and with a high amount of clutter and occlusion. The images were captured from a systematically sampled view sphere around the object/scene, and are annotated with accurate ground truth 6D poses of all modeled objects. Initial evaluation results indicate that the state of the art in 6D object pose estimation has ample room for improvement, especially in difficult cases with significant occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less. The visual appearance of a texture-less object is dominated by its global shape, color, reflectance properties, and the configuration of light sources. The lack of texture implies that the object cannot be reliably recognized with traditional techniques relying on photometric local patch detectors and descriptors [9,31]. Instead, recent approaches that can deal with texture-less objects have focused on local 3D feature description [33,51,19], and semi-global or arXiv:1701.05498v1 [cs.CV]

show abstract

Section: Depth-only and Rgb-only Datasetsmentioning

confidence: 99%

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects

Hodaň

Haluza

Obdržálek

et al. 2017

2017 IEEE Winter Conference on Applications of Computer Vision (WACV)

413

329

View full text Add to dashboard Cite

show abstract

“…if they point roughly in the same direction. This is similar to the orthogonal line search proposed in [12]. The edge-to-edge association provides multiple P2L tasks per link, which are updated at each iteration by rendering the new estimated state.…”

Section: E Tracking Objectivementioning

confidence: 96%

“…Visual features: Different sparse and dense visual features have been used in tracking literature to establish correspondences between the observed and estimated state of a 3D model. Early work in this area used dense features like colour image edges [11], [1], [12] and depth images [9]. These correspondences are based on the local appearance of the estimated state and change with each iteration of the optimisation.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Learning-driven Coarse-to-Fine Articulated Robot Tracking

Rauch

Ivan

Hospedales

et al. 2019

2019 International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

In this work we present an articulated tracking approach for robotic manipulators, which relies only on visual cues from colour and depth images to estimate the robot's state when interacting with or being occluded by its environment. We hypothesise that articulated model fitting approaches can only achieve accurate tracking if subpixel-level accurate correspondences between observed and estimated state can be established. Previous work in this area has exclusively relied on either discriminative depth information or colour edge correspondences as tracking objective and required initialisation from joint encoders. In this paper we propose a coarse-to-fine articulated state estimator, which relies only on visual cues from colour edges and learned depth keypoints, and which is initialised from a robot state distribution predicted from a depth image. We evaluate our approach on four RGB-D sequences showing a KUKA LWR arm with a Schunk SDH2 hand interacting with its environment and demonstrate that this combined keypoint and edge tracking objective can estimate the palm position with an average error of 2.5cm without using any joint encoder sensing.

show abstract

“…Feature Extraction Phase. During an off-line feature extraction phase, 3D pose [180], [34], [41], [181] or 6D pose [176], [179], [178], [182], [177], [23], [2], [24], [25] annotated templates involved in the training data are represented with robust feature descriptors. Features are manually-crafted utilizing the available shape, geometry, and appearance information [176], [179], [178], [182], [177], [23], [2], [25], and the recent paradigm in the field is to deep learn those using neural net architectures [180], [34], [41], [181].…”

Section: Template Matchingmentioning

confidence: 99%

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

Şahin

Garcia-Hernando

Sock

et al. 2020

Image and Vision Computing

View full text Add to dashboard Cite

Object pose recovery has gained increasing attention in the computer vision field as it has become an important problem in rapidly evolving technological areas related to autonomous driving, robotics, and augmented reality. Existing review-related studies have addressed the problem at visual level in 2D, going through the methods which produce 2D bounding boxes of objects of interest in RGB images. The 2D search space is enlarged either using the geometry information available in the 3D space along with RGB (Mono/Stereo) images, or utilizing depth data from LIDAR sensors and/or RGB-D cameras. 3D bounding box detectors, producing category-level amodal 3D bounding boxes, are evaluated on gravity aligned images, while full 6D object pose estimators are mostly tested at instance-level on the images where the alignment constraint is removed. Recently, 6D object pose estimation is tackled at the level of categories. In this paper, we present the first comprehensive and most recent review of the methods on object pose recovery, from 3D bounding box detectors to full 6D pose estimators. The methods mathematically model the problem as a classification, regression, classification & regression, template matching, and point-pair feature matching task. Based on this, a mathematical-model-based categorization of the methods is established. Datasets used for evaluating the methods are investigated with respect to the challenges, and evaluation metrics are studied. Quantitative results of experiments in the literature are analysed to show which category of methods best performs across what types of challenges. The analyses are further extended comparing two methods, which are our own implementations, so that the outcomes from the public results are further solidified. Current position of the field is summarized regarding object pose recovery, and possible research directions are identified.

show abstract

Fast 6D pose estimation for texture-less objects from a single RGB image

Cited by 37 publications

References 23 publications

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects

Learning-driven Coarse-to-Fine Articulated Robot Tracking

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

Contact Info

Product

Resources

About