2019 International Conference on Computational Science and Computational Intelligence (CSCI) 2019
DOI: 10.1109/csci49370.2019.00052
|View full text |Cite
|
Sign up to set email alerts
|

Temporal 3D Human Pose Estimation for Action Recognition from Arbitrary Viewpoints

Abstract: This work presents a new view-invariant action recognition system that is able to classify human actions by using a single RGB camera, including challenging camera viewpoints. Understanding actions from different viewpoints remains an extremely challenging problem, due to depth ambiguities, occlusion, and large variety of appearances and scenes. Moreover, using only the information from the 2D perspective gives different interpretations for the same action seen from different viewpoints. Our system operates in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 28 publications
0
2
0
Order By: Relevance
“…Indeed, these two tasks are closely related. Every 3D pose can be projected to a 2D pose, and a 3D pose can also be inferred using 2D pose estimation [ 48 ]. Most current Human Pose Estimation algorithms are focused on predicting the coordinates of human keypoint, i.e., keypoint localization, which portrays the human pose by determining the spatial location relationship between keypoints through a priori knowledge.…”
Section: Related Workmentioning
confidence: 99%
“…Indeed, these two tasks are closely related. Every 3D pose can be projected to a 2D pose, and a 3D pose can also be inferred using 2D pose estimation [ 48 ]. Most current Human Pose Estimation algorithms are focused on predicting the coordinates of human keypoint, i.e., keypoint localization, which portrays the human pose by determining the spatial location relationship between keypoints through a priori knowledge.…”
Section: Related Workmentioning
confidence: 99%
“…Estimating the 3D location from individual frames leads to a temporally incoherent result, where the independent error from each frame leads to unstable 3D position estimation over the video sequence. Thus, in our work, we follow the same approach proposed in [22,17] for human pose estimation where a fully convolutional architecture is used to perform temporal convolution over 2D skeleton joint positions in order to estimate the 3D skeleton in a video. Therefore, the function g(•) is approximated by a Sequence to Sequence (Seq2Seq) Temporal Convolutional Network (TCN) model as can be seen in Figure 4 using 1D temporal convolution.…”
Section: D Trajectory Estimationmentioning
confidence: 99%