2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.01393
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Path Learning for Object Pose Estimation Across Domains

Abstract: We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together. We learn an encoding of object views that does not only describe an implicit orientation of all objects seen during training, but can also relate views of untrained objects. Our single-encoder-multi-decoder network is trained using a technique we denote "multi-path learning": While the encoder is shared by all objects, each decoder only reconstructs views of a single object. Consequently, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
172
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 96 publications
(174 citation statements)
references
References 36 publications
2
172
0
Order By: Relevance
“…To handle symmetric objects, Tian et al [64] propose to uniformly sample rotation anchors and estimate deviations of the anchors to the target. In addition, Sundermeyer et al [60], [61] introduce an implicit way of representing 3-D rotations by training an autoencoder for image reconstruction, which does not need to predefine the symmetry axes for symmetric objects. We leverage this implicit 3-D rotation representation in our work and show how to combine it with particle filtering for 6-D object pose tracking.…”
Section: A Six-dimensional Object Pose Estimationmentioning
confidence: 99%
“…To handle symmetric objects, Tian et al [64] propose to uniformly sample rotation anchors and estimate deviations of the anchors to the target. In addition, Sundermeyer et al [60], [61] introduce an implicit way of representing 3-D rotations by training an autoencoder for image reconstruction, which does not need to predefine the symmetry axes for symmetric objects. We leverage this implicit 3-D rotation representation in our work and show how to combine it with particle filtering for 6-D object pose tracking.…”
Section: A Six-dimensional Object Pose Estimationmentioning
confidence: 99%
“…To extract the latent rotation feature, we train the autoencoder to reconstruct the observed points transformed from the observed depth map of the object. There are several advantages to this strategy: 1) the reconstruction of observed points is view-based and symmetry invariant [32,33], 2) the reconstruction of observed points is easier than that of a complete object model (shown in Table 2), and 3) more representative orientation feature can be learned (shown in Table 1).…”
Section: Rotation-aware Autoencodermentioning
confidence: 99%
“…In [32,33], the authors also reconstructed the input images to observed views. However, the input and output of their models are 2D images that are different from our 3D point cloud input and output.…”
Section: Rotation-aware Autoencodermentioning
confidence: 99%
“…Although deep learning methods for 6D pose estimation have achieved very accurate results, most such methods are trained for particular objects and do not generalize to unseen objects without retraining, which can take tens of GPU-days [20], [2], [21], [4]. Several recent works tackled the zero-shot pose estimation problem by learning a latent object representation [6], [7], [8]. However, recent analysis has revealed that such methods perform poorly in cluttered scenes, even when a ground-truth bounding box is provided as input [9].…”
Section: Related Work a Zero-shot Pose Estimationmentioning
confidence: 99%
“…To address this issue, a number of zero-shot pose estimators have been developed. However, most zero-shot pose estimators only evaluate on sparse, uncluttered scenes where the object of interest is detected and cropped or is sitting on an empty table [6], [7], [8]. Evaluation of such methods in cluttered settings shows that such methods fail to provide reasonable performance, even with the addition of groundtruth bounding boxes or ground truth translation as input (see analysis in Okorn, et al [9] Appendix B).…”
Section: Introductionmentioning
confidence: 99%