The task of three-dimensional (3D) human pose estimation from a single image can be divided into two parts: (1) Two-dimensional (2D) human joint detection from the image and (2) estimating a 3D pose from the 2D joints. Herein, we focus on the second part, i.e., a 3D pose estimation from 2D joint locations. The problem with existing methods is that they require either (1) a 3D pose dataset or (2) 2D joint locations in consecutive frames taken from a video sequence. We aim to solve these problems. For the first time, we propose a method that learns a 3D human pose without any 3D datasets. Our method can predict a 3D pose from 2D joint locations in a single image. Our system is based on the generative adversarial networks, and the networks are trained in an unsupervised manner. Our primary idea is that, if the network can predict a 3D human pose correctly, the 3D pose that is projected onto a 2D plane should not collapse even if it is rotated perpendicularly. We evaluated the performance of our method using Human3.6M and the MPII dataset and showed that our network can predict a 3D pose well even if the 3D dataset is not available during training.
No abstract
Understanding how humans subjectively look at and evaluate images is an important task for various applications in the field of multimedia interaction. While it has been repeatedly pointed out that eye movements can be used to infer the internal states of humans, not many successes have been reported concerning image understanding. We investigate the possibility of image preference estimation based on a person’s eye movements in a supervised manner in this paper. A dataset of eye movements is collected while the participants are viewing pairs of natural images, and it is used to train image preference label classifiers. The input feature is defined as a combination of various fixation and saccade event statistics, and the use of the random forest algorithm allows us to quantitatively assess how each of the statistics contributes to the classification task. We show that the gaze-based classifier had a higher level of accuracy than metadata-based baseline methods and a simple rule-based classifier throughout the experiments. We also present a quantitative comparison with image-based preference classifiers and discuss the potential and limitations of the gaze-based preference estimator.
Sensing of human motion is very important for humancomputer interactive applications such as virtual reality, gesture recognition, and communication. A vision system is suitable for human-computer interaction since this involves passive sensing and the system can estimate the motion of the user without any discomfort for the user. In this paper, we propose an algorithm for fast posture estimation of the user from an image sequence by using the position information of the hands and head. Our algorithm is founded on a model based method. The parameters of a geometric model are calculated from human kinematics. It is inadequate to obtain a unique solution from the image information such as position of the head and hands. The unknown parameters are predicted by the motion models and the previous posture parameters, then the unknown parameters are adjusted by a minimization method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.