Multi-Path Learning for Object Pose Estimation Across Domains

Sundermeyer, Martin; Durner, Maximilian; Puang, En Yen; Márton, Zoltán-Csaba; Vaškevičius, Narunas; Arras, Kai O.; Triebel, Rudolph

doi:10.1109/cvpr42600.2020.01393

Cited by 96 publications

(174 citation statements)

References 36 publications

Supporting

Mentioning

172

Contrasting

Order By: Relevance

“…To handle symmetric objects, Tian et al [64] propose to uniformly sample rotation anchors and estimate deviations of the anchors to the target. In addition, Sundermeyer et al [60], [61] introduce an implicit way of representing 3-D rotations by training an autoencoder for image reconstruction, which does not need to predefine the symmetry axes for symmetric objects. We leverage this implicit 3-D rotation representation in our work and show how to combine it with particle filtering for 6-D object pose tracking.…”

Section: A Six-dimensional Object Pose Estimationmentioning

confidence: 99%

PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking

et al. 2021

View full text Add to dashboard Cite

Tracking 6-D poses of objects from videos provides rich information to a robot in performing different tasks such as manipulation and navigation. In this article, we formulate the 6-D object pose tracking problem in the Rao-Blackwellized particle filtering framework, where the 3-D rotation and the 3-D translation of an object are decoupled. This factorization allows our approach, called PoseRBPF, to efficiently estimate the 3-D translation of an object along with the full distribution over the 3-D rotation. This is achieved by discretizing the rotation space in a fine-grained manner and training an autoencoder network to construct a codebook of feature embeddings for the discretized rotations. As a result, PoseRBPF can track objects with arbitrary symmetries while still maintaining adequate posterior distributions. Our approach achieves state-of-the-art results on two 6-D pose estimation benchmarks. We open-source our implementation at https://github.com/NVlabs/PoseRBPF.

show abstract

Section: A Six-dimensional Object Pose Estimationmentioning

confidence: 99%

PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking

et al. 2021

View full text Add to dashboard Cite

show abstract

“…To extract the latent rotation feature, we train the autoencoder to reconstruct the observed points transformed from the observed depth map of the object. There are several advantages to this strategy: 1) the reconstruction of observed points is view-based and symmetry invariant [32,33], 2) the reconstruction of observed points is easier than that of a complete object model (shown in Table 2), and 3) more representative orientation feature can be learned (shown in Table 1).…”

Section: Rotation-aware Autoencodermentioning

confidence: 99%

“…In [32,33], the authors also reconstructed the input images to observed views. However, the input and output of their models are 2D images that are different from our 3D point cloud input and output.…”

Section: Rotation-aware Autoencodermentioning

confidence: 99%

FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

Chen

Jia

Chang

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

128

View full text Add to dashboard Cite

In this paper, we focus on category-level 6D pose and size estimation from a monocular RGB-D image. Previous methods suffer from inefficient category-level pose feature extraction, which leads to low accuracy and inference speed. To tackle this problem, we propose a fast shapebased network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. First, we design an orientation aware autoencoder with 3D graph convolution for latent feature extraction. Thanks to the shift and scaleinvariance properties of 3D graph convolution, the learned latent feature is insensitive to point shift and object size. Then, to efficiently decode category-level rotation information from the latent feature, we propose a novel decoupled rotation mechanism that employs two decoders to complementarily access the rotation information. For translation and size, we estimate them by two residuals: the difference between the mean of object points and ground truth translation, and the difference between the mean size of the category and ground truth size, respectively. Finally, to increase the generalization ability of the FS-Net, we propose an online box-cage based 3D deformation mechanism to augment the training data. Extensive experiments on two benchmark datasets show that the proposed method achieves state-ofthe-art performance in both category-and instance-level 6D object pose estimation. Especially in category-level pose estimation, without extra synthetic data, our method outperforms existing methods by 6.3% on the NOCS-REAL dataset 1 .

show abstract

“…Although deep learning methods for 6D pose estimation have achieved very accurate results, most such methods are trained for particular objects and do not generalize to unseen objects without retraining, which can take tens of GPU-days [20], [2], [21], [4]. Several recent works tackled the zero-shot pose estimation problem by learning a latent object representation [6], [7], [8]. However, recent analysis has revealed that such methods perform poorly in cluttered scenes, even when a ground-truth bounding box is provided as input [9].…”

Section: Related Work a Zero-shot Pose Estimationmentioning

confidence: 99%

“…To address this issue, a number of zero-shot pose estimators have been developed. However, most zero-shot pose estimators only evaluate on sparse, uncluttered scenes where the object of interest is detected and cropped or is sitting on an empty table [6], [7], [8]. Evaluation of such methods in cluttered settings shows that such methods fail to provide reasonable performance, even with the addition of groundtruth bounding boxes or ground truth translation as input (see analysis in Okorn, et al [9] Appendix B).…”

Section: Introductionmentioning

confidence: 99%

OSSID: Online Self-Supervised Instance Detection by (and for) Pose Estimation

Gu,

Okorn,

Held

2022

Preprint

View full text Add to dashboard Cite

Real-time object pose estimation is necessary for many robot manipulation algorithms. However, state-of-the-art methods for object pose estimation are trained for a specific set of objects; these methods thus need to be retrained to estimate the pose of each new object, often requiring tens of GPU-days of training for optimal performance. In this paper, we propose the OSSID framework, leveraging a slow zero-shot pose estimator to self-supervise the training of a fast detection algorithm. This fast detector can then be used to filter the input to the pose estimator, drastically improving its inference speed. We show that this self-supervised training exceeds the performance of existing zero-shot detection methods on two widely used object pose estimation and detection datasets, without requiring any human annotations. Further, we show that the resulting method for pose estimation has a significantly faster inference speed, due to the ability to filter out large parts of the image. Thus, our method for self-supervised online learning of a detector (trained using pseudo-labels from a slow pose estimator) leads to accurate pose estimation at real-time speeds, without requiring human annotations. Supplementary materials and code can be found at https://georgegu1997.github.io/OSSID/

show abstract

Multi-Path Learning for Object Pose Estimation Across Domains

Cited by 96 publications

References 36 publications

PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking

PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking

FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

OSSID: Online Self-Supervised Instance Detection by (and for) Pose Estimation

Contact Info

Product

Resources

About