2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00280
|View full text |Cite
|
Sign up to set email alerts
|

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Abstract: Locating 3D objects from a single RGB image via Perspective-n-Point (PnP) is a long-standing problem in computer vision. Driven by end-to-end deep learning, recent studies suggest interpreting PnP as a differentiable layer, allowing for partial learning of 2D-3D point correspondences by backpropagating the gradients of pose loss. Yet, learning the entire correspondences from scratch is highly challenging, particularly for ambiguous pose solutions, where the globally optimal pose is theoretically non-differenti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
55
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 116 publications
(55 citation statements)
references
References 50 publications
0
55
0
Order By: Relevance
“…While in other sequences where only a partially occluded body is observed, the single-joint-based method always fails to locate the target person as expected. MonoLoco [30] and Mono3DBox [31] also perform poorly with a low recall and a high ALE, especially on Sequence II to IV. Mono3DPose [32] and MonoDepth [37] can work on TABLE I.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…While in other sequences where only a partially occluded body is observed, the single-joint-based method always fails to locate the target person as expected. MonoLoco [30] and Mono3DBox [31] also perform poorly with a low recall and a high ALE, especially on Sequence II to IV. Mono3DPose [32] and MonoDepth [37] can work on TABLE I.…”
Section: Resultsmentioning
confidence: 99%
“…MonoLoc [30] first uses a neural network to detect joints of a person in an image, and then utilizes these estimated 2D joint positions to locate the person in 3D by a multi-task neural network. EPro-PnP [31] describes the pose of a person in the form of a 3D bounding box by integrating learnable 2D-3D correspondences. RootNet [32] develops a top-down pose estimation solution that computes the 3D poses of multiple people with respect to the camera coordinate frame.…”
Section: D Person Location Estimationmentioning
confidence: 99%
See 1 more Smart Citation
“…But since the PnP problem is not differentiable at some points [55], it becomes difficult to learn all points and weights in an end-to-end manner. Recently, EPro-PnP proposed by Chen et al [56] makes the PnP problem derivable by introducing a probability density distribution, which greatly enhances the adaptability of the monocular object pose estimation model. The deep learning-based 6DoF pose estimation work no longer relies on 3D template matching [57].…”
Section: Pose Estimation Methodsmentioning
confidence: 99%
“…Algorithms for such 2D-3D correspondence registration are either classical ones, like the Perspective-n-Points (PnP) algorithm [54], or learned functions [13], [44], [100], [105].…”
Section: Pose Representationsmentioning
confidence: 99%