2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00475
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Online Multi-Person 2D Pose Tracking With Recurrent Spatio-Temporal Affinity Fields

Abstract: We present an online approach to efficiently and simultaneously detect and track 2D poses of multiple people in a video sequence. We build upon Part Affinity Fields (PAF) representation designed for static images, and propose an architecture that can encode and predict Spatio-Temporal Affinity Fields (STAF) across a video sequence. In particular, we propose a novel temporal topology cross-linked across limbs which can consistently handle body motions of a wide range of magnitudes. Additionally, we make the ove… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
77
0
4

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 118 publications
(81 citation statements)
references
References 33 publications
0
77
0
4
Order By: Relevance
“…Finally, a particularly important component to develop in future work with SLEAP will be to incorporate learnable tracking to enable the pose estimation models to better take advantage of temporal context. For example, the PAF representation could be extended to the time domain [27]. The top-down approach can also combine detection and tracking [31], although this requires sets of contiguous ground-truth frames which greatly increases the time and effort required for labeling.…”
Section: Discussionmentioning
confidence: 99%
“…Finally, a particularly important component to develop in future work with SLEAP will be to incorporate learnable tracking to enable the pose estimation models to better take advantage of temporal context. For example, the PAF representation could be extended to the time domain [27]. The top-down approach can also combine detection and tracking [31], although this requires sets of contiguous ground-truth frames which greatly increases the time and effort required for labeling.…”
Section: Discussionmentioning
confidence: 99%
“…Then, we extract the feature data of each canvas using a PoseNet model that can be called with the browser [ 22 , 53 ]. In recent years, other pose estimate models such as the ones described in [ 54 , 55 , 56 ] have demonstrated potential for use in smart homes. To improve the process of continuously extracting feature data, we introduce a method to implement an offline PoseNet model.…”
Section: Methodsmentioning
confidence: 99%
“…It uses an adaptive background subtraction in order to identify foreground regions for catching users' movements. In [29], authors present an online approach to simultaneously detect 2D poses of multiple people in a video sequence. They exploit Part Affinity Field (PAF) representations designed for static images, and they propose an architecture that can encode Spatio-Temporal Affinity Fields (STAF) across a video sequence.…”
Section: Related Workmentioning
confidence: 99%