An Adaptive Viewpoint Transformation Network for 3D Human Pose Estimation

Liang, Guoqiang; Zhong, Xiangping; Ran, Lingyan; Zhang, Yanning

doi:10.1109/access.2020.3013917

Cited by 4 publications

(2 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most conversational corpora consist of TV interviews and theatrical plays that have shown themselves to be an appropriate resource of spontaneous conversational expressions, and are significantly more suitable for research in wider ‘discourse concepts’ than any artificially recorded material [ 37 ]. Most methods related to 3D Pose and Shape Estimation from monocular sources refer to (Deep) Convolutional Neural Networks ((D) CNNs) and leverage 2D joint tracking and predict 3D joint poses in the form of stick figures [ 38 , 39 , 40 , 41 , 42 ]. The major challenge with deep learning and similar probabilistic approaches is that the tracking process involves predicting the most probable configuration of the artificial skeleton.…”

Section: Introductionmentioning

confidence: 99%

Capturing Conversational Gestures for Embodied Conversational Agents Using an Optimized Kaneda–Lucas–Tomasi Tracker and Denavit–Hartenberg-Based Kinematic Model

Mocnik

Kačič

Šafarič

et al. 2022

Sensors

View full text Add to dashboard Cite

In order to recreate viable and human-like conversational responses, the artificial entity, i.e., an embodied conversational agent, must express correlated speech (verbal) and gestures (non-verbal) responses in spoken social interaction. Most of the existing frameworks focus on intent planning and behavior planning. The realization, however, is left to a limited set of static 3D representations of conversational expressions. In addition to functional and semantic synchrony between verbal and non-verbal signals, the final believability of the displayed expression is sculpted by the physical realization of non-verbal expressions. A major challenge of most conversational systems capable of reproducing gestures is the diversity in expressiveness. In this paper, we propose a method for capturing gestures automatically from videos and transforming them into 3D representations stored as part of the conversational agent’s repository of motor skills. The main advantage of the proposed method is ensuring the naturalness of the embodied conversational agent’s gestures, which results in a higher quality of human-computer interaction. The method is based on a Kanade–Lucas–Tomasi tracker, a Savitzky–Golay filter, a Denavit–Hartenberg-based kinematic model and the EVA framework. Furthermore, we designed an objective method based on cosine similarity instead of a subjective evaluation of synthesized movement. The proposed method resulted in a 96% similarity.

show abstract

Section: Introductionmentioning

confidence: 99%

Capturing Conversational Gestures for Embodied Conversational Agents Using an Optimized Kaneda–Lucas–Tomasi Tracker and Denavit–Hartenberg-Based Kinematic Model

Mocnik

Kačič

Šafarič

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…It is widely considered a fundamental problem in computer vision due to its many downstream applications, including action recognition [6]- [11] and human tracking [12]- [14]. In particular, it is a precursor to 3D human pose estimation [15]- [17], which serves as a potential alternative to invasive marker-based motion capture.…”

Section: Introductionmentioning

confidence: 99%

EvoPose2D: Pushing the Boundaries of 2D Human Pose Estimation Using Accelerated Neuroevolution With Weight Transfer

et al. 2021

View full text Add to dashboard Cite

Neural architecture search has proven to be highly effective in the design of efficient convolutional neural networks that are better suited for mobile deployment than hand-designed networks. Hypothesizing that neural architecture search holds great potential for human pose estimation, we explore the application of neuroevolution, a form of neural architecture search inspired by biological evolution, in the design of 2D human pose networks for the first time. Additionally, we propose a new weight transfer scheme that enables us to accelerate neuroevolution in a flexible manner. Our method produces network designs that are more efficient and more accurate than state-of-the-art hand-designed networks. In fact, the generated networks process images at higher resolutions using less computation than previous hand-designed networks at lower resolutions, allowing us to push the boundaries of 2D human pose estimation. Our base network designed via neuroevolution, which we refer to as EvoPose2D-S, achieves comparable accuracy to SimpleBaseline while being 50% faster and 12.7x smaller in terms of file size. Our largest network, EvoPose2D-L, achieves new state-of-the-art accuracy on the Microsoft COCO Keypoints benchmark, is 4.3x smaller than its nearest competitor, and has similar inference speed. The code is publicly available at https://github.com/wmcnally/evopose2d.

show abstract

A Survey of Recent Advances on Two-Step 3D Human Pose Estimation

Manesco

Marana

2022

Intelligent Systems

View full text Add to dashboard Cite

An Adaptive Viewpoint Transformation Network for 3D Human Pose Estimation

Cited by 4 publications

References 30 publications

Capturing Conversational Gestures for Embodied Conversational Agents Using an Optimized Kaneda–Lucas–Tomasi Tracker and Denavit–Hartenberg-Based Kinematic Model

Capturing Conversational Gestures for Embodied Conversational Agents Using an Optimized Kaneda–Lucas–Tomasi Tracker and Denavit–Hartenberg-Based Kinematic Model

EvoPose2D: Pushing the Boundaries of 2D Human Pose Estimation Using Accelerated Neuroevolution With Weight Transfer

A Survey of Recent Advances on Two-Step 3D Human Pose Estimation

Contact Info

Product

Resources

About