2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00631
|View full text |Cite
|
Sign up to set email alerts
|

Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(5 citation statements)
references
References 49 publications
0
5
0
Order By: Relevance
“…With the advent of multi-modal models, Mo-tionCLIP (Tevet et al 2022) harnessed the latent space of CLIP (Huang et al 2019). The diffusion model emerged as a novel tool, with MotionDiffuse (Zhang et al 2022) and ReMoDiffuse (Zhang et al 2023b) generating vivid, semantically consistent, and high-fidelity motion sequences. This exciting domain continues to evolve, pushing the boundaries of text-driven motion synthesis.…”
Section: Related Work Text To Motion Methodsmentioning
confidence: 99%
“…With the advent of multi-modal models, Mo-tionCLIP (Tevet et al 2022) harnessed the latent space of CLIP (Huang et al 2019). The diffusion model emerged as a novel tool, with MotionDiffuse (Zhang et al 2022) and ReMoDiffuse (Zhang et al 2023b) generating vivid, semantically consistent, and high-fidelity motion sequences. This exciting domain continues to evolve, pushing the boundaries of text-driven motion synthesis.…”
Section: Related Work Text To Motion Methodsmentioning
confidence: 99%
“…While multi‐view camera algorithms [AARS13, BSC13, DFJ*22] have achieved higher accuracy, they often require laborious camera system calibration. Mono‐camera approaches with optimization techniques [BKL*16, KPD19] and neural networks [PZDD17,WLLL22,HPY*22] lack depth information and struggle to track global translations. Despite offering an additional depth channel, RGBD‐based solutions [BMB*11, MSS*17, YZ21] are hindered by limited camera resolution and a field of view (FOV), which makes them impractical for product‐level applications.…”
Section: Related Workmentioning
confidence: 99%
“…It means that the possible set of actions that an actor can perform should concern surrounding environment Tahtali 2018, 2021). Affordance can be considered in various tasks that involve the visual understanding of the human, such as hand pose estimation (Grady et al 2021;Corona et al 2020;Williams and Mahapatra 2019), 3D human avatar generation (Li et al 2019b;Zhang et al 2020a;Hassan et al 2021), 3D pose generation (Wang et al 2019), motion prediction (Huang et al 2022;Cao et al 2020;Wang et al 2021), object affordance prediction (Do, Nguyen, and Reid 2018;Kim and Sukhatme 2014;Fang et al 2018), and shape estimation (Clever et al 2020).…”
Section: Related Workmentioning
confidence: 99%