2022
DOI: 10.1109/lra.2022.3145494
|View full text |Cite
|
Sign up to set email alerts
|

AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

Abstract: In this letter, we present a novel markerless 3D human motion capture (MoCap) system for unstructured, outdoor environments that uses a team of autonomous unmanned aerial vehicles (UAVs) with on-board RGB cameras and computation. Existing methods are limited by calibrated cameras and offline processing. Thus, we present the first method (AirPose) to estimate human pose and shape using images captured by multiple extrinsically uncalibrated flying cameras. AirPose itself calibrates the cameras relative to the pe… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 23 publications
(16 citation statements)
references
References 29 publications
0
16
0
Order By: Relevance
“…These are limited in the number of subjects, clothing variety, and scenarios. Synthetic data is being used to solve these problems [26], [27]. However, such data does not include either the full camera's state, IMU readings, scene depth, LiDAR data, or offers the possibility to easily extend it after the experiment has been recorded (e.g.…”
Section: B Dynamic Contentmentioning
confidence: 99%
See 1 more Smart Citation
“…These are limited in the number of subjects, clothing variety, and scenarios. Synthetic data is being used to solve these problems [26], [27]. However, such data does not include either the full camera's state, IMU readings, scene depth, LiDAR data, or offers the possibility to easily extend it after the experiment has been recorded (e.g.…”
Section: B Dynamic Contentmentioning
confidence: 99%
“…with additional cameras or sensors), thus is generally unusable for any robotics application. Indeed, synthetic datasets are usually developed by stitching people over image backgrounds [27], statically placing them in some limited environment [26] and often recorded with static monocular cameras that take single pictures [26]. Furthermore, many of those are generated without any clothing information [28].…”
Section: B Dynamic Contentmentioning
confidence: 99%
“…The initial SMPL position and orientation in the first frame act as the pivot for all the optimizing parameters. Since optimizing all the parameters together is more likely to get stuck in a local minimum, we follow previous work [23] and do optimization in three phases. In all the optimization stages, we minimize the same loss function, which is a weighted combination of multiple loss terms.…”
Section: Camera and Human Pose Estimationmentioning
confidence: 99%
“…E COS and E CPS are the camera motion smoothing terms for camera orientation and position, respectively. Following the previous work [23], we use a simple L2 loss on the positions and the 6D representation of the camera orientations. They are given as…”
Section: Camera and Human Pose Estimationmentioning
confidence: 99%
See 1 more Smart Citation