2022
DOI: 10.1609/aaai.v36i2.20089
|View full text |Cite
|
Sign up to set email alerts
|

REMOTE: Reinforced Motion Transformation Network for Semi-supervised 2D Pose Estimation in Videos

Abstract: Existing approaches for 2D pose estimation in videos often require a large number of dense annotations, which are costly and labor intensive to acquire. In this paper, we propose a semi-supervised REinforced MOtion Transformation nEtwork (REMOTE) to leverage a few labeled frames and temporal pose variations in videos, which enables effective learning of 2D pose estimation in sparsely annotated videos. Specifically, we introduce a Motion Transformer (MT) module to perform cross frame reconstruction, aiming to l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 37 publications
0
2
0
Order By: Relevance
“…Compared to end-to-end 3D pose estimation, the 2D-to-3D pipeline approach divides the task into two independent parts: 2D pose estimation from the image and lifting the 2D pose to a 3D pose. Recent 2D-to-3D lifting methods [6,[19][20][21][22][23] have demonstrated superior performance compared to end-to-end approaches, owing to the reliable and effective 2D keypoint detection methods developed in previous works [24][25][26][27]. To advance 2D-to-3D pose estimation, a Graph Stacked Hourglass Network [22] is introduced.…”
Section: Related Workmentioning
confidence: 99%
“…Compared to end-to-end 3D pose estimation, the 2D-to-3D pipeline approach divides the task into two independent parts: 2D pose estimation from the image and lifting the 2D pose to a 3D pose. Recent 2D-to-3D lifting methods [6,[19][20][21][22][23] have demonstrated superior performance compared to end-to-end approaches, owing to the reliable and effective 2D keypoint detection methods developed in previous works [24][25][26][27]. To advance 2D-to-3D pose estimation, a Graph Stacked Hourglass Network [22] is introduced.…”
Section: Related Workmentioning
confidence: 99%
“…Ma et al [114] recently proposed a semi-supervised approach that utilizes labeled frames and temporal dynamics (predicted key points) to address the problem of limited availability of temporally sparse annotations in videos. They introduced the REinforced MOtion Transformation nEtwork (REMOTE) framework, where a Motion Transformer (MT) and an RL-based Frame Selection Agent (FSA) are combined.…”
Section: Single Pose Estimation Video-basedmentioning
confidence: 99%