2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01074
|View full text |Cite
|
Sign up to set email alerts
|

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 34 publications
(9 citation statements)
references
References 44 publications
0
9
0
Order By: Relevance
“…In this work, we have focused purely on monocular video as this is the most widely available modality. We are also enthusiastic about integrating physics-based modeling in the inference process (42)(43)(44)(45) and ways to combine this with self-supervised learning (46). It is also important to make this system easier to use in a higher-throughput manner.…”
Section: Discussionmentioning
confidence: 99%
“…In this work, we have focused purely on monocular video as this is the most widely available modality. We are also enthusiastic about integrating physics-based modeling in the inference process (42)(43)(44)(45) and ways to combine this with self-supervised learning (46). It is also important to make this system easier to use in a higher-throughput manner.…”
Section: Discussionmentioning
confidence: 99%
“…The traditional world-to-image projection typically involves transferring the world coordinates of the object to the camera space and then projecting them to the image plane. To simplify the process, previous methods [42,74,10,16,21,69] directly regress human poses in the camera coordinate system and feed them to a projection function. However, such a simplified process introduces an ambiguous problem applying the same projection function to different scenes.…”
Section: Camera and Motion Decouplingmentioning
confidence: 99%
“…Following previous baseline [42], we build these encoders upon temporal dilated convolutional networks. Moreover, inspired by previous skeleton-based 3D human pose estimation [31,42,74,10,66], we adopt two separate branches in the body-specific regressors to learn the global trajectory and root-relative body motions. Specifically, the trajectory branch learns the camera parameters α, ϕ, h and the global human transition t b , while the motion branch learns the shape parameter β b and pose parameter θ b of the body model.…”
Section: Sequential Full-body Motion Recoverymentioning
confidence: 99%
“…Visual attention has been widely used in deep learning and achieves remarkable advances [59,69]. It has been exploited in computer vision tasks such as image recognition [9,10,33,48,61,71] and object detection among others [6,16,35,66,67]. CAM [77] provides the attention visualization of feature maps for model interpretable analysis.…”
Section: Related Workmentioning
confidence: 99%