HMOR: Hierarchical Multi-person Ordinal Relations for Monocular Multi-person 3D Pose Estimation

Wang, Can; Li, Jiefeng; Li, Wentao; Qian, Chen; Lu, Cewu

doi:10.1007/978-3-030-58580-8_15

Cited by 54 publications

(42 citation statements)

References 65 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We evaluate BEV on two multi-person datasets: inthe-wild using the 2D RH and in 3D using the synthetic AGORA [24] dataset. On RH, compared with previous methods [10,21,34,41], BEV is more accurate in relative depth reasoning and pose estimation. On AGORA, BEV significantly improves detection and achieves state-of-theart results on "AGORA kids" in terms of the mesh reconstruction error.…”

Section: Introductionmentioning

confidence: 90%

“…While previous multi-person methods perform well in constrained experimental settings, they struggle with severe occlusion, diverse body size and appearance, the ambiguity of monocular depth, and in-the-wild cases [10,21,34,39,41]. These challenges lead to unsatisfactory performance in crowded scenes, including detection misses, similar predictions for overlapping people, and all predictions having a similar height.…”

Section: Introductionmentioning

confidence: 99%

“…However, current methods lack sufficiently powerful representations to learn from these cases. A few learning-based methods have been proposed for depth reasoning of predicted body meshes [10,39] or 3D poses [21,34,41]. Unfortunately, they all reason about depth via 2D representations, such as RoI-aligned features [10,21], a 2D depth map [34,41], or multi-scale 2D center maps [39].…”

Section: Introductionmentioning

confidence: 99%

“…A few learning-based methods have been proposed for depth reasoning of predicted body meshes [10,39] or 3D poses [21,34,41]. Unfortunately, they all reason about depth via 2D representations, such as RoI-aligned features [10,21], a 2D depth map [34,41], or multi-scale 2D center maps [39]. These regression-based 2D representations have inherent drawbacks for representing the 3D world.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Putting People in their Place: Monocular Regression of 3D People in Depth

Sun¹,

Liu²,

Bao³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 90%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Putting People in their Place: Monocular Regression of 3D People in Depth

Sun¹,

Liu²,

Bao³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…When only one camera is available, the problem is underdetermined since many 3D poses may correspond to the same 2D pose. Leveraging the learning-based method, 3D poses can be recovered by lifting detected 2D poses [41,42,55], or directly regressing 3D poses [3,13,48,53], or by fitting parametric human body models [21,53]. However, the reconstruction accuracy of these methods is limited due to the depth ambiguities and strong occlusions when multiple humans are close to each other.…”

Section: Multi-person 3d Pose Estimationmentioning

confidence: 99%

Shape-aware Multi-Person Pose Estimation from Multi-View Images

Dong

Song

Chen

et al. 2021

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

In this paper we contribute a simple yet effective approach for estimating 3D poses of multiple people from multi-view images. Our proposed coarse-to-fine pipeline first aggregates noisy 2D observations from multiple camera views into 3D space and then associates them into individual instances based on a confidence-aware majority voting technique. The final pose estimates are attained from a novel optimization scheme which links high-confidence multi-view 2D observations and 3D joint candidates. Moreover, a statistical parametric body model such as SMPL is leveraged as a regularizing prior for these 3D joint candidates. Specifically, both 3D poses and SMPL parameters are optimized jointly in an alternating fashion. Here the parametric models help in correcting implausible 3D pose estimates and filling in missing joint detections while updated 3D poses in turn guide obtaining better SMPL estimations. By linking 2D and 3D observations, our method is both accurate and generalizes to different data sources because it better decouples the final 3D pose from the interperson constellation and is more robust to noisy 2D detections. We systematically evaluate our method on public datasets and achieve state-of-the-art performance. The code and video will be available on the project page: https://ait.ethz.ch/projects/2021/multi-human-pose/.

show abstract