We define action keystates as the start or end of an action that contains information such as the human pose and time. Existing methods that forecast the human pose use recurrent networks that input and output a sequence of poses. In this paper, we present a method tailored for everyday pick and place actions where the object of interest is known. In contrast to existing methods, ours uses an input from a single timestep to directly forecast (i) the key pose the instant the pick or place action is performed and (ii) the time it takes to get to the predicted key pose. Experimental results show that our method outperforms the state-of-the-art for key pose forecasting and is comparable for time forecasting while running at least an order of magnitude faster. Further ablative studies reveal the significance of the object of interest in enabling the total number of parameters across all existing methods to be reduced by at least 90% without any degradation in performance. a
Collaborative robots that operate alongside humans require the ability to understand their intent and forecast their pose. Among the various indicators of intent, the eye gaze is particularly important as it signals action towards the gazed object. By observing a person's gaze, one can effectively predict the object of interest and subsequently, forecast the person's pose. We leverage this and present a method that forecasts the human pose using gaze information for everyday pick and place actions in a home environment. Our method first attends to fixations to locate the coordinates of the object of interest before inputting said coordinates to a pose forecasting network. Experiments on the MoGaze dataset show that our gaze network lowers the errors of existing pose forecasting methods and that incorporating prior in the form of textual instructions further lowers the errors by a significant amount. Furthermore, the use of eye gaze now allows a simple multilayer perceptron network to directly forecast the keypose. a
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.