MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction

Dang, Lingwei; Nie, Yongwei; Long, Chengjiang; Zhang, Qing; Li, Guiqing

doi:10.1109/iccv48922.2021.01127

Cited by 175 publications

(132 citation statements)

References 39 publications

Supporting

Mentioning

132

Contrasting

Order By: Relevance

“…Ablation comparisons show that the prediction between the extended sequences is easier than between the original sequences, and the former achieves significantly better prediction accuracy than the latter. Dang et al [10] ascribed this to the global residual connection between the extended input and output, while in this paper we interpret this phenomenon from another perspective: the last observed pose provides an "initial guess" for the target future poses. From the initial guess, the network just needs to move slightly such that it can reach the target positions.…”

Section: Introductionmentioning

confidence: 79%

“…The works of [23,27,28] use GCN either in the encoder [27,28] for feature encoding or in the decoder [23] for better decoding. The works of [9,10,32,33] are totally based on GCN. Mao et al [33] viewed a pose as a fullyconnected graph and used GCN to discover the relationship between any pair of joints.…”

Section: Related Workmentioning

confidence: 99%

“…In the temporal domain, they represented the joint trajectories by Discrete Cosine Transform coefficients. Dang et al [10] extended [33] to a multiscale version across the abstraction levels of human pose. We also use GCN as the basic buildingblock, but propose S-DGCN and T-DGCN that extract global spatiotemporal features, better than [10,32,33] that just extract spatial features.…”

Section: Related Workmentioning

confidence: 99%

“…We observe that starting from the seminal work of LTD [33], all recent GCN-based approaches [9,10,32,40] share the following preprocessing steps: (1) They duplicate the last observed pose as many times as the length of the future pose sequence, and append the duplicated poses to the observed sequence to form an extended input sequence.…”

Section: Introductionmentioning

confidence: 99%

“…Existing GCN-based approaches [9,10,33] • We conduct extensive experiments showing that our method outperforms previous approaches by large margins on three public datasets.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction

Ma¹,

Nie²,

Long³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

This paper presents a high-quality human motion prediction method that accurately predicts future human poses given observed ones. Our method is based on the observation that a good "initial guess" of the future poses is very helpful in improving the forecasting accuracy. This motivates us to propose a novel two-stage prediction framework, including an init-prediction network that just computes the good guess and then a formal-prediction network that predicts the target future poses based on the guess. More importantly, we extend this idea further and design a multi-stage prediction framework where each stage predicts initial guess for the next stage, which brings more performance gain. To fulfill the prediction task at each stage, we propose a network comprising Spatial Dense Graph Convolutional Networks (S-DGCN) and Temporal Dense Graph Convolutional Networks (T-DGCN). Alternatively executing the two networks helps extract spatiotemporal features over the global receptive field of the whole pose sequence. All the above design choices cooperating together make our method outperform previous approaches by large margins: 6%-7% on Human3.6M, 5%-10% on CMU-MoCap, and 13%-16% on 3DPW. Code is available at https://github.com/705062791/PGBIG.

show abstract

Section: Introductionmentioning

confidence: 79%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…Existing GCN-based approaches [9,10,33] • We conduct extensive experiments showing that our method outperforms previous approaches by large margins on three public datasets.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction

Ma¹,

Nie²,

Long³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Rethinking Learning Approaches for Long-Term Action Anticipation

Nawhal

Jyothi

Mori

2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Action anticipation involves predicting future actions having observed the initial portion of a video. Typically, the observed video is processed as a whole to obtain a video-level representation of the ongoing activity in the video, which is then used for future prediction. We introduce Anticipatr which performs long-term action anticipation leveraging segment-level representations learned using individual segments from different activities, in addition to a video-level representation. We propose a two-stage learning approach to train a novel transformer-based model that uses these two types of representations to directly predict a set of future action instances over any given anticipation duration. Results on Breakfast, 50Salads, Epic-Kitchens-55, and EGTEA Gaze+ datasets demonstrate the effectiveness of our approach.

show abstract