2022
DOI: 10.3390/s22114109
|View full text |Cite
|
Sign up to set email alerts
|

Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos

Abstract: Two-dimensional (2D) multi-person pose estimation and three-dimensional (3D) root-relative pose estimation from a monocular RGB camera have made significant progress recently. Yet, real-world applications require depth estimations and the ability to determine the distances between people in a scene. Therefore, it is necessary to recover the 3D absolute poses of several people. However, this is still a challenge when using cameras from single points of view. Furthermore, the previously proposed systems typicall… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 89 publications
0
8
0
Order By: Relevance
“…In the same comparison context and beyond the ground truth accuracy, Table 4 illustrates another comparison, using the matched ground truth accuracy as a parameter, with Refs. 62 and 60; therefore, the suggested method surpasses existing 3D multi-person position recognition algorithms by a wide margin. However, there is still significant opportunity for improvement.…”
Section: Numerical Resultsmentioning
confidence: 91%
“…In the same comparison context and beyond the ground truth accuracy, Table 4 illustrates another comparison, using the matched ground truth accuracy as a parameter, with Refs. 62 and 60; therefore, the suggested method surpasses existing 3D multi-person position recognition algorithms by a wide margin. However, there is still significant opportunity for improvement.…”
Section: Numerical Resultsmentioning
confidence: 91%
“…[169] introduces a two-stage pipeline and presents a unified framework that combines YOLOv5, HRNet, and TCN for real-time 2D/3D human pose estimation. In the work presented in [170], a similar pipeline was pursued, but it was expanded by incorporating a root depth estimator, a feature not present in the approach outlined in [169]. This incorporation enables the system to derive camera-centric coordinates and integrates YOLOv3 for human detection, HRNet as the 2D pose estimator, and GAST-Net for 3D root-relative pose reconstruction.…”
Section: Unified Framework For Real-time Applicationsmentioning
confidence: 99%
“…Adopting a top-down approach similar to Refs. [169,170], Dong et al [171] proposed an approach for multi-person 3D pose estimation. However, a notable distinction lies in their consideration of multiple views instead of a monocular view.…”
Section: Unified Framework For Real-time Applicationsmentioning
confidence: 99%
“…Over the past few years, deep neural networks have achieved state-of-the-art performance in computer vision [1][2][3][4], natural language processing [5][6][7], reinforcement learning [8][9][10], and various other fields [11][12][13]. However, with the increasing depth, as well as the width of the network, for example from the shallow LeNet to the wider Inception structure in GoogLeNet and deeper Resnet convolutional architecture, as well as the currently popular transformer architecture, the number of parameters of the deep model is constantly growing, which in turn, leads to a series of problems such as the redundancy of network parameters, more rigorous hardware requirements, and difficulty in training the model, and large deep models severely limit their applications in low-memory or high-real-time conditions.…”
Section: Introductionmentioning
confidence: 99%