ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747710
|View full text |Cite
|
Sign up to set email alerts
|

Using a Single Input to Forecast Human Action Keystates in Everyday Pick and Place Actions

Abstract: We define action keystates as the start or end of an action that contains information such as the human pose and time. Existing methods that forecast the human pose use recurrent networks that input and output a sequence of poses. In this paper, we present a method tailored for everyday pick and place actions where the object of interest is known. In contrast to existing methods, ours uses an input from a single timestep to directly forecast (i) the key pose the instant the pick or place action is performed an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 18 publications
0
2
0
Order By: Relevance
“…Early deterministic models tend to use recurrent networks such as Gated Recurrent Units (GRU) (Chung et al 2014) or fully convolutional layers. Several other works incorporate additional context such as eye gaze (Razali and Demiris 2021) or the object coordinates (Razali and Demiris 2022;Taheri et al 2022). Lastly, the context-aware model (Corona et al 2020) forecasts both the human pose and object motion and is related to our work, although there exist several notable differences.…”
Section: Related Workmentioning
confidence: 94%
“…Early deterministic models tend to use recurrent networks such as Gated Recurrent Units (GRU) (Chung et al 2014) or fully convolutional layers. Several other works incorporate additional context such as eye gaze (Razali and Demiris 2021) or the object coordinates (Razali and Demiris 2022;Taheri et al 2022). Lastly, the context-aware model (Corona et al 2020) forecasts both the human pose and object motion and is related to our work, although there exist several notable differences.…”
Section: Related Workmentioning
confidence: 94%
“…These works can also be categorized into deterministic (Martinez, Black, and Romero 2017) or stochastic (Liu et al 2021), using Variational Autoencoders (VAEs) (Kingma and Welling 2013) or Generative Adversarial Networks (GANs) (Goodfellow et al 2014) respectively, with the design choice hinging on whether there is sufficient variation to be learnt by the model. Many recent works incorporate additional context such as scene (Corona et al 2020), eye gaze (Razali and Demiris 2022b;Zheng et al 2022), or object coordinates (Razali and Demiris 2022a).…”
Section: Related Workmentioning
confidence: 99%