SportsCap: Monocular 3D Human Motion Capture and Fine-grained Understanding in Challenging Sports Videos

Chen, Xin; Pang, Anqi; Yang, Wei; Ma, Yuexin; Xu, Lu; Yu, Jingyi

doi:10.48550/arxiv.2104.11452

Cited by 5 publications

(7 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our work can be classified under AQA and SA, which involves the computer vision-based quantification of the quality of movements and actions. Works in AQA and SA have mainly been focused on domains like physiotherapy [6,19,25,33,36], Olympic sports [3,24,28,35,39,41], various types of skills [5,20,26,38]. However, workout form assessment, especially, in real-world conditions, has not received much attention.…”

Section: Related Workmentioning

confidence: 99%

“…This is especially prevalent in non-daily action classes like fitness and sports domains. This can be mitigated, for example, by annotating domain-specific datasets [3], but that requires a considerable amount of manual annotation efforts, financial resources, and 3D annotations can only be obtained in controlled conditions. Therefore, we propose to learn domain-specific pose-sensitive representations from unlabeled videos, which can be finetuned using only a small labeled dataset.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment

Parmar¹,

Gharat²,

Rhodin³

2022

Preprint

View full text Add to dashboard Cite

Maintaining proper form while exercising is important for preventing injuries and maximizing muscle mass gains. While fitness apps are becoming popular, they lack the functionality to detect errors in workout form. Detecting such errors naturally requires estimating users' body pose. However, off-the-shelf pose estimators struggle to perform well on the videos recorded in gym scenarios due to factors such as camera angles, occlusion from gym equipment, illumination, and clothing. To aggravate the problem, the errors to be detected in the workouts are very subtle. To that end, we propose to learn exercise-specific representations from unlabeled samples such that a small dataset annotated by experts suffices for supervised error detection. In particular, our domain knowledge-informed self-supervised approaches exploit the harmonic motion of the exercise actions, and capitalize on the large variances in camera angles, clothes, and illumination to learn powerful representations. To facilitate our self-supervised pretraining, and supervised finetuning, we curated a new exercise dataset, Fitness-AQA, comprising of three exercises: BackSquat, BarbellRow, and OverheadPress. It has been annotated by expert trainers for multiple crucial and typically occurring exercise errors. Experimental results show that our selfsupervised representations outperform off-the-shelf 2D-& 3D-pose estimators and several other baselines.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment

Parmar¹,

Gharat²,

Rhodin³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The highend solutions [9,12,13,24] adopt studio-setup with dense cameras to produce high-quality reconstruction and surface motion, but the synchronized and calibrated multi-camera systems are both difficult to deploy and expensive. The recent low-end approaches [10,16,21,66] enable light-weight performance capture under the single-view setup or even hand-held capture setup or drone-based capture setup [68]. However, these methods require a naked human model or pre-scanned template.…”

Section: Related Workmentioning

confidence: 99%

Neural Free-Viewpoint Performance Rendering under Complex Human-object Interactions

Sun

Chen

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

Self Cite

View full text Add to dashboard Cite

human-object rendering scheme, which combines direction-aware neural blending weight learning and spatial-temporal texture completion to provide high-resolution and photo-realistic texture results in the occluded scenarios. Extensive experiments demonstrate the effectiveness of our approach to achieve high-quality geometry and texture reconstruction in free viewpoints for challenging human-object interactions. CCS CONCEPTS• Computing methodologies → Image-based rendering.

show abstract

“…The high-end solutions [Dou et al, 2017;Joo et al, 2018;Chen et al, 2019] require studio-setup with the dense view of cameras and a controlled imaging environment to generate high-fidelity reconstruction and high-quality surface motion, which are expensive and difficult to deploy. The recent low-end approaches [Xiang et al, 2019;Chen et al, 2021] enable light-weight performance capture under the single-view setup. However, these methods require a naked human model or pre-scanned template.…”

Section: Related Workmentioning

confidence: 99%

Few-shot Neural Human Performance Rendering from Sparse RGBD Videos

Pang,

Chen,

Luo

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Recent neural rendering approaches for human activities achieve remarkable view synthesis results, but still rely on dense input views or dense training with all the capture frames, leading to deployment difficulty and inefficient training overload. However, existing advances will be ill-posed if the input is both spatially and temporally sparse. To fill this gap, in this paper we propose a few-shot neural human rendering approach (FNHR) from only sparse RGBD inputs, which exploits the temporal and spatial redundancy to generate photo-realistic free-view output of human activities. Our FNHR is trained only on the key-frames which expand the motion manifold in the input sequences. We introduce a two-branch neural blending to combine the neural point render and classical graphics texturing pipeline, which integrates reliable observations over sparse key-frames. Furthermore, we adopt a patch-based adversarial training process to make use of the local redundancy and avoids over-fitting to the key-frames, which generates fine-detailed rendering results. Extensive experiments demonstrate the effectiveness of our approach to generate high-quality free view-point results for challenging human performances under the sparse setting.

show abstract

SportsCap: Monocular 3D Human Motion Capture and Fine-grained Understanding in Challenging Sports Videos

Cited by 5 publications

References 56 publications

Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment

Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment

Neural Free-Viewpoint Performance Rendering under Complex Human-object Interactions

Few-shot Neural Human Performance Rendering from Sparse RGBD Videos

Contact Info

Product

Resources

About