Estimating Human Dynamics On-the-fly Using Monocular Video For Pose Estimation

Agarwal, Priyanshu; Kumar, Suren; Ryde, Julian; Corso, Jason J.; Krovi, Venkat

doi:10.15607/rss.2012.viii.001

Cited by 7 publications

(12 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We assign object-topic to each object-word w o nd from P latent objecttopics, indicating which object-topic is interacted within the clip. The assignments are denoted as z (1) nd and z (2) nd . We use superscripts (1), (2) to denote action-topics and object-topics respectively.…”

Section: Learning Modelmentioning

confidence: 99%

“…We model the co-occurrence by drawing their priors from a mixture distribution. In the graphic model, π (1) kd , π (2) pd decide the probability of action-topic k and object-topic p occurring in a document d, where…”

Section: Learning Modelmentioning

confidence: 99%

“…There are a large number of works on vision-based human activity recognition for robots. These works infer the semantic label of the overall activity or localize actions in the complex activity for better human-robot interactions [20], [2], [10], assistive robotics [14], [34]. Given the input RGB/RGB-D videos [28], [17], [8], or 3D human joint motions [21], [26], or from other inertial/location sensors [9], [22], they train the perception model using fully or weekly labeled actions [17], [7], [13], or locations of annotated human/their interactive objects [30], [24].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Watch-Bot: Unsupervised learning for reminding humans of forgotten actions

Zhang²,

Selman

et al. 2016

2016 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

Abstract-We present a robotic system that watches a human using a Kinect v2 RGB-D sensor, detects what he forgot to do while performing an activity, and if necessary reminds the person using a laser pointer to point out the related object. Our simple setup can be easily deployed on any assistive robot.Our approach is based on a learning algorithm trained in a purely unsupervised setting, which does not require any human annotations. This makes our approach scalable and applicable to variant scenarios. Our model learns the action/object cooccurrence and action temporal relations in the activity, and uses the learned rich relationships to infer the forgotten action and the related object. We show that our approach not only improves the unsupervised action segmentation and action cluster assignment performance, but also effectively detects the forgotten actions on a challenging human activity RGB-D video dataset. In robotic experiments, we show that our robot is able to remind people of forgotten actions successfully.

show abstract

Section: Learning Modelmentioning

confidence: 99%

Section: Learning Modelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Watch-Bot: Unsupervised learning for reminding humans of forgotten actions

Zhang²,

Selman

et al. 2016

2016 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

show abstract

“…Model-based tracking is a derived technique which focuses on tracking the pose estimate from one time step to the next, starting from a known initialization based on an approximate dynamical model. Most generative pose estimation frameworks suffer from the fact that the optimisation is prone to hitting local minima, requiring a good initialisation and often failing on complex motions [2]. Pictorial Structures (PS) is a probabilistic inference for a tree-structured graphical model where the overall cost function for a pose decomposes across edges and nodes of the tree.…”

Section: Previous Workmentioning

confidence: 99%

“…Most algorithms focus on recovering pose from single images and/or do not make full use of the temporal constraint on limb motion [3,17,14,2]. Available off-the-shelf single image methods such as [7] have trouble coping with the difficulty of the data, even when the human is detected.…”

Section: Previous Workmentioning

confidence: 99%

Athlete Pose Estimation from Monocular TV Sports Footage

Fastovets

Guillemaut

Hilton

2013

2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops

View full text Add to dashboard Cite

Human pose estimation from monocular video streams is a challenging problem. Much of the work on this problem has focused on developing inference algorithms and probabilistic prior models based on learned measurements. Such algorithms face challenges in generalization beyond the learned dataset. We propose an interactive modelbased generative approach for estimating the human pose in 2D from uncalibrated monocular video in unconstrained sports TV footage without any prior learning on motion captured or annotated data. Belief-propagation over a spatio-temporal graph of candidate body part hypotheses is used to estimate a temporally consistent pose between key-frame constraints. Experimental results show that the proposed generative pose estimation framework is capable of estimating pose even in very challenging unconstrained scenarios.

show abstract