MotionRNN: A Flexible Model for Video Prediction with Spacetime-Varying Motions

Wu, Haixu; Yao, Zhiyu; Wang, Jianmin; Long, Mingsheng

doi:10.1109/cvpr46437.2021.01518

Cited by 119 publications

(65 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This feature allows them to be trained on sequences to reflect the underlying temporal dynamics. Thus, recurrent structures have become popular solutions for video prediction [27], embedded dynamics [28], trajectory forecasting [29] and translation [30], etc.…”

Section: Related Workmentioning

confidence: 99%

A Recurrent Differentiable Engine for Modeling Tensegrity Robots Trainable with Low-Frequency Data

Wang¹,

Aanjaneya²,

Bekris³

2022

Preprint

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

A Recurrent Differentiable Engine for Modeling Tensegrity Robots Trainable with Low-Frequency Data

Wang¹,

Aanjaneya²,

Bekris³

2022

Preprint

View full text Add to dashboard Cite

“…Guen et al proposed the PhyCell [19] to disentangle physical dynamics from unknown factors to predict more reliable motions. And Wu et al [20] proposed the Motion-GRU to independently model the transient variation and motion trend for more satisfactory predictions.…”

Section: Related Workmentioning

confidence: 99%

“…Some works [17], [18] attempted to utilize the preserved visual details during the feature extraction to augment the visual quality of the predictions. [19], [20] have explored to disentangle the physical dynamics (motion patterns) to help predict more satisfactory human actions. In addition, some works [21], [22], [23], [24], [25], [26] have begun to augment the long-term memorizing ability for LSTMs with the help of the attention mechanism, which can also help broaden the spatiotemporal receptive field of the predictive units.…”

Section: Introductionmentioning

confidence: 99%

STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond

Chang¹,

Zhang²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Video prediction aims to predict future frames by modeling the complex spatiotemporal dynamics in videos. However, most of the existing methods only model the temporal information and the spatial information for videos in an independent manner but haven't fully explored the correlations between both terms. In this paper, we propose a SpatioTemporal-Aware Unit (STAU) for video prediction and beyond by exploring the significant spatiotemporal correlations in videos. On the one hand, the motion-aware attention weights are learned from the spatial states to help aggregate the temporal states in the temporal domain. On the other hand, the appearance-aware attention weights are learned from the temporal states to help aggregate the spatial states in the spatial domain. In this way, the temporal information and the spatial information can be greatly aware of each other in both domains, during which, the spatiotemporal receptive field can also be greatly broadened for more reliable spatiotemporal modeling. Experiments are not only conducted on traditional video prediction tasks but also other tasks beyond video prediction, including the early action recognition and object detection tasks. Experimental results show that our STAU can outperform other methods on all tasks in terms of performance and computation efficiency.

show abstract

“…wider (e.g., TrajGRU [14], PredRNN [7], MIM [15], and SA-ConvLSTM [10]) or deeper (e.g., PredRNN++ [9] and MotionRNN [18]). As a result, they bring limited improvement in model performance but introduce significant growth in GPU memory usage.…”

Section: Introductionmentioning

confidence: 99%

MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive Learning

Ma¹,

Zhang²,

Liu³

2022

Preprint

View full text Add to dashboard Cite

Spatiotemporal predictive learning, which predicts future frames through historical prior knowledge with the aid of deep learning, is widely used in many fields. Previous work essentially improves the model performance by widening or deepening the network, but it also brings surging memory overhead, which seriously hinders the development and application of this technology. In order to improve the performance without increasing memory consumption, we focus on scale, which is another dimension to improve model performance but with low memory requirement. The effectiveness has been widely proved in many CNN-based tasks such as image classification and semantic segmentation, but it has not been fully explored in recent RNN models. In this paper, learning from the benefit of multi-scale, we propose a general framework named Multi-Scale RNN (MS-RNN) to boost recent RNN models for spatiotemporal predictive learning. By integrating different scales, we enhance the existing models with both improved performance and greatly reduced overhead. We verify our MS-RNN framework by exhaustive experiments with 6 popular RNN models (ConvLSTM, TrajGRU, PredRNN, PredRNN++, MIM, and MotionRNN) on 4 different datasets (Moving MNIST, KTH, TaxiBJ, and HKO-7). The results show the efficiency that the RNN models incorporating our framework have much lower memory cost but better performance than before. Our code is released at https://github.com/mazhf/MS-RNN.

show abstract

MotionRNN: A Flexible Model for Video Prediction with Spacetime-Varying Motions

Cited by 119 publications

References 9 publications

A Recurrent Differentiable Engine for Modeling Tensegrity Robots Trainable with Low-Frequency Data

A Recurrent Differentiable Engine for Modeling Tensegrity Robots Trainable with Low-Frequency Data

STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond

MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive Learning

Contact Info

Product

Resources

About