Deep Representation Learning for Human Motion Prediction and Classification

Bütepage, Judith; Black, Michael J.; Kragić, Danica; Kjellström, Hedvig

doi:10.1109/cvpr.2017.173

Cited by 359 publications

(287 citation statements)

References 17 publications

Supporting

Mentioning

287

Contrasting

Order By: Relevance

“…Bütepage et al [2] propose to encode poses with a hierarchy of dense layers following the kinematic chain starting from the end-effectors (dubbed H-TE), which is similar to our SP-layer. In contrast to this work, H-TE operates on the input rather than the output, and has only been demonstrated with non-recurrent networks when using 3D positions to parameterize the poses.…”

Section: Related Workmentioning

confidence: 99%

“…Bütepage et al [2,3] and Holden et al [10] convert the data directly to 3D joint positions. These works do not use recurrent structures, which necessitates the extraction of fixed-size, temporal windows for training.…”

Section: Related Workmentioning

confidence: 99%

“…These works do not use recurrent structures, which necessitates the extraction of fixed-size, temporal windows for training. [2] and [10] focus on learning of latent representations, which are shown to be helpful for various tasks, such as denoising, forecasting, or motion generation along a given trajectory [9]. [3] extends [2] by applying a conditional variational autoencoder (VAE) to the task of online motion prediction in human-robot interactions.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Structured Prediction Helps 3D Human Motion Modelling

Aksan

Kaufmann

Hilliges

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

197

151

View full text Add to dashboard Cite

Figure 1: We introduce a structured prediction layer (SPL) to the task of 3D human motion modelling. The SP-layer explicitly decomposes the pose into individual joints and can be interfaced with a variety of baseline architectures. We show that on H3.6M and a recent, much larger dataset, AMASS, a variety of baseline models benefit when augmented with an SP-layer. AbstractHuman motion prediction is a challenging and important task in many computer vision application domains. Existing work only implicitly models the spatial structure of the human skeleton. In this paper, we propose a novel approach that decomposes the prediction into individual joints by means of a structured prediction layer that explicitly models the joint dependencies. This is implemented via a hierarchy of small-sized neural networks connected analogously to the kinematic chains in the human body as well as a joint-wise decomposition in the loss function. The proposed layer is agnostic to the underlying network and can be used with existing architectures for motion modelling. Prior work typically leverages the H3.6M dataset. We show that some state-of-the-art techniques do not perform well when trained and tested on AMASS, a recently released dataset 14 times the size of H3.6M. Our experiments indicate that the proposed layer increases the performance of motion forecasting irrespective of the base network, jointangle representation, and prediction horizon. We furthermore show that the layer also improves motion predictions qualitatively. We make code and models publicly available at https://ait.ethz.ch/projects/2019/spl.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Structured Prediction Helps 3D Human Motion Modelling

Aksan

Kaufmann

Hilliges

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

197

151

View full text Add to dashboard Cite

show abstract

“…Earlier works model the synthesis of human motion using techniques such as Hidden Markov Models [7], linear dynamical systems [40], bilinear spatiotemporal basis models [2], and Gaussian process latent variable models [45,56] and other variants [22,55]. More recently, there are deep learning-based approaches that use recurrent neural networks (RNNs) to predict 3D future human motion from past 3D human skeletons [18,24,9,32,47]. All of these approaches operate in the domain where the inputs are 3D past motion capture sequences.…”

Section: Related Workmentioning

confidence: 99%

Predicting 3D Human Dynamics From Video

Zhang

Felsen

Kanazawa

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

100

View full text Add to dashboard Cite

Alternate ViewpointPHD Alternate Viewpoint … Figure 1: Autoregressive prediction of human 3D motion from video. We present Predicting Human Dynamics (PHD), a neural autoregressive framework that takes past video frames as input to predict the motion of a 3D human body model. As shown, PHD takes in a video sequence of a person and predicts the future 3D human motion. We show the predictions from two different viewpoints. AbstractGiven a video of a person in action, we can easily guess the 3D future motion of the person. In this work, we present perhaps the first approach for predicting a future 3D mesh model sequence of a person from past video input. We do this for periodic motions such as walking and also actions like bowling and squatting seen in sports or workout videos. While there has been a surge of future prediction problems in computer vision, most approaches predict 3D future from 3D past or 2D future from 2D past inputs. In this work, we focus on the problem of predicting 3D future motion from past image sequences, which has a plethora of practical applications in autonomous systems that must operate safely around people from visual inputs. Inspired by the success of autoregressive models in language modeling tasks, we learn an intermediate latent space on which we predict the future. This effectively facilitates autoregressive predictions when the input differs from the output domain. Our approach can be trained on video sequences obtained in-the-wild without 3D ground truth labels. The project website with videos can be found at https

show abstract

“…Recurrent neural networks have been effectively utilized for sequence prediction in multiple fields [74], [75], [76], [77]. We adapted the convolutional RNN autoencoder model to a sequence prediction model by removing the pooling layers and fully-connected layers and altering the number of nodes in the central LSTM layer (Fig.…”

Section: G Rnns Predict Muscle Stem Cell Motilitymentioning

confidence: 99%

Deep convolutional and recurrent neural networks for cell motility discrimination and prediction

Kimmel

Brack

Marshall

2017

Preprint

View full text Add to dashboard Cite

Cells in culture display diverse motility behaviors that may reflect differences in cell state and function, providing motivation to discriminate between different motility behaviors. Current methods to do so rely upon manual feature engineering. However, the types of features necessary to distinguish between motility behaviors can vary greatly depending on the biological context, and it is not always clear which features may be most predictive in each setting for distinguishing particular cell types or disease states. Convolutional neural networks (CNNs) are machine learning models allowing for relevant features to be learned directly from spatial data. Similarly, recurrent neural networks (RNNs) are a class of models capable of learning long term temporal dependencies. Given that cell motility is inherently spacio-temporal data, we present an approach utilizing both convolutional and long-short-term memory (LSTM) recurrent neural network units to analyze cell motility data. These RNN models provide accurate classification of simulated motility and experimentally measured motility from multiple cell types, comparable to results achieved with hand-engineered features. The variety of cell motility differences we can detect suggests that the algorithm is generally applicable to additional cell types not analyzed here. RNN autoencoders based on the same architecture are capable of learning motility features in an unsupervised manner and capturing variation between myogenic cells in the latent space. Adapting these RNN models to motility prediction, RNNs are capable of predicting muscle stem cell motility from past tracking data with performance superior to standard motion prediction models. This advance in cell motility prediction may be of practical utility in cell tracking applications.

show abstract

Deep Representation Learning for Human Motion Prediction and Classification

Cited by 359 publications

References 17 publications

Structured Prediction Helps 3D Human Motion Modelling

Structured Prediction Helps 3D Human Motion Modelling

Predicting 3D Human Dynamics From Video

Deep convolutional and recurrent neural networks for cell motility discrimination and prediction

Contact Info

Product

Resources

About