ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation

Su, Yongzhi; Saleh, Mahdi; Fetzer, Torben; Rambach, Jason; Navab, Nassir; Busam, Benjamin; Stricker, Didier; Tombari, Federico

doi:10.1109/cvpr52688.2022.00662

Cited by 94 publications

(34 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Occlusion handling is an important challenge for object pose estimation. Figure 1 presents the performance of general purpose methods [45], [75], [80], [78], [93], [94], [112] compared to those designed for occlusion handling [43], [49], [98], [100], [104], [105], [108], [110]. Evaluations on the LM [35] and the LMO [10] dataset are presented.…”

Section: B Occlusion Handlingmentioning

confidence: 99%

“…Early deep learning works identified performance improvements when using keypoints as regression target instead of directly regressing the 6D pose [98], [17], [82], [108]. Such geometric correspondences are nowadays the most commonly used surrogate training targets for representing 6D object poses [18], [31], [37], [46], [43], [94], [100], [64]. The 6D pose is derived by registering the estimated 2D correspondences to the corresponding ground-truth 3D ones.…”

Section: Pose Representationsmentioning

confidence: 99%

“…Finding correspondence formulations to fully replace the current standard ones is an important topic since performance gains are expected [94]. Alternatively, designing different ways of learning that do not explicitly use 6D poses or 2D-3D correspondences as regression targets is gaining momentum [73], [104].…”

Section: Pose Representationsmentioning

confidence: 99%

“…Apart from estimating surrogate regression targets in a separate stage, as described in Section II-C, pose estimation research tends to use separate networks for detection and for pose estimation, often creating one network per object of interest, in order to improve accuracy [78], [31], [53], [94], [58], [80]. The main problems connected to multiobject learning is that imbalances need to be handled well to reduce the network bias.…”

Section: B Multi-object Learningmentioning

confidence: 99%

See 3 more Smart Citations

Open Challenges for Monocular Single-shot 6D Object Pose Estimation

Thalhammer¹,

Peter²,

Weibel³

et al. 2023

Preprint

View full text Add to dashboard Cite

Object pose estimation is a non-trivial task that enables robotic manipulation, bin picking, augmented reality, and scene understanding, to name a few use cases. Monocular object pose estimation gained considerable momentum with the rise of high-performing deep learning-based solutions and is particularly interesting for the community since sensors are inexpensive and inference is fast. Prior works establish the comprehensive state of the art for diverse pose estimation problems. Their broad scopes make it difficult to identify promising future directions. We narrow down the scope to the problem of singleshot monocular 6D object pose estimation, which is commonly used in robotics, and thus are able to identify such trends. By reviewing recent publications in robotics and computer vision, the state of the art is established at the union of both fields. Following that, we identify promising research directions in order to help researchers to formulate relevant research ideas and effectively advance the state of the art. Findings include that methods are sophisticated enough to overcome the domain shift and that occlusion handling is a fundamental challenge. We also highlight problems such as novel object pose estimation and challenging materials handling as central challenges to advance robotics.

show abstract

Section: B Occlusion Handlingmentioning

confidence: 99%

Section: Pose Representationsmentioning

confidence: 99%

Section: Pose Representationsmentioning

confidence: 99%

Section: B Multi-object Learningmentioning

confidence: 99%

See 2 more Smart Citations

Open Challenges for Monocular Single-shot 6D Object Pose Estimation

Thalhammer¹,

Peter²,

Weibel³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…With the development of deep neural networks (DNNs), early methods [1,14,25,48] formulated pose estimation as a regression problem, directly mapping the input image to the 6D object pose. More recently, most works [5, 20,22,32,34,37,38,40,41,42,43] draw inspiration from geometry and seek to predict 2D-3D correspon- pose Figure 1. Difference between other differentiable PnP losses and our proposed loss.…”

Section: Introductionmentioning

confidence: 99%

SD-Pose: Semantic Decomposition for Cross-Domain 6D Object Pose Estimation

Salzmann³

et al. 2021

AAAI

View full text Add to dashboard Cite

The current leading 6D object pose estimation methods rely heavily on annotated real data, which is highly costly to acquire. To overcome this, many works have proposed to introduce computer-generated synthetic data. However, bridging the gap between the synthetic and real data remains a severe problem. Images depicting different levels of realism/semantics usually have different transferability between the synthetic and real domains. Inspired by this observation, we introduce an approach, SD-Pose, that explicitly decomposes the input image into multi-level semantic representations and then combines the merits of each representation to bridge the domain gap. Our comprehensive analyses and experiments show that our semantic decomposition strategy can fully utilize the different domain similarities of different representations, thus allowing us to outperform the state of the art on modern 6D object pose datasets without accessing any real data during training.

show abstract