Haziq Razali scite author profile

We define action keystates as the start or end of an action that contains information such as the human pose and time. Existing methods that forecast the human pose use recurrent networks that input and output a sequence of poses. In this paper, we present a method tailored for everyday pick and place actions where the object of interest is known. In contrast to existing methods, ours uses an input from a single timestep to directly forecast (i) the key pose the instant the pick or place action is performed and (ii) the time it takes to get to the predicted key pose. Experimental results show that our method outperforms the state-of-the-art for key pose forecasting and is comparable for time forecasting while running at least an order of magnitude faster. Further ablative studies reveal the significance of the object of interest in enabling the total number of parameters across all existing methods to be reduced by at least 90% without any degradation in performance. a

show abstract

A Log-likelihood Regularized KL Divergence for Video Prediction With a 3D Convolutional Variational Recurrent Network

Razali

Fernando

2021

View full text Add to dashboard Cite

Multitask Variational Autoencoding of Human-to-Human Object Handover

Razali

Demiris

2021

View full text Add to dashboard Cite

Using Eye Gaze to Forecast Human Pose in Everyday Pick and Place Actions

Razali

Demiris

2022

View full text Add to dashboard Cite

Collaborative robots that operate alongside humans require the ability to understand their intent and forecast their pose. Among the various indicators of intent, the eye gaze is particularly important as it signals action towards the gazed object. By observing a person's gaze, one can effectively predict the object of interest and subsequently, forecast the person's pose. We leverage this and present a method that forecasts the human pose using gaze information for everyday pick and place actions in a home environment. Our method first attends to fixations to locate the coordinates of the object of interest before inputting said coordinates to a pose forecasting network. Experiments on the MoGaze dataset show that our gaze network lowers the errors of existing pose forecasting methods and that incorporating prior in the form of textual instructions further lowers the errors by a significant amount. Furthermore, the use of eye gaze now allows a simple multilayer perceptron network to directly forecast the keypose. a

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Haziq Razali

Pedestrian intention prediction: A convolutional bottom-up multi-task approach

Using a Single Input to Forecast Human Action Keystates in Everyday Pick and Place Actions

A Log-likelihood Regularized KL Divergence for Video Prediction With a 3D Convolutional Variational Recurrent Network

Multitask Variational Autoencoding of Human-to-Human Object Handover

Using Eye Gaze to Forecast Human Pose in Everyday Pick and Place Actions

Contact Info

Product

Resources

About