Im2Flow: Motion Hallucination from Static Images for Action Recognition

Gao, Ruohan; Xiong, Bo; Grauman, Kristen

doi:10.1109/cvpr.2018.00622

Cited by 83 publications

(113 citation statements)

References 77 publications

Supporting

Mentioning

112

Contrasting

Order By: Relevance

“…In the last years CNNs have successfully been trained to estimate the optical flow, including FlowNet [9,18], SpyNet [34] and PWC-Net [45], and achieve low End-Point Error (EPE) on challenging benchmarks, such as MPI Sintel [4] and KITTI 2015 [31]. Im2Flow work [13] also shows optical flow can be hallucinated from still images. Recent work however, shows that accuracy of optical flow does not strongly correlate with accuracy of video recognition [36].…”

Section: Motion Representation and Optical Flow Estimationmentioning

confidence: 99%

See 1 more Smart Citation

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

Shou

Lin

Kalantidis

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

139

View full text Add to dashboard Cite

Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very timeconsuming. Recent works directly leverage the motion vectors and residuals readily available in the compressed video to represent motion at no cost. While this avoids flow computation, it also hurts accuracy since the motion vector is noisy and has substantially reduced resolution, which makes it a less discriminative motion representation. To remedy these issues, we propose a lightweight generator network, which reduces noises in motion vectors and captures fine motion details, achieving a more Discriminative Motion Cue (DMC) representation. Since optical flow is a more accurate motion representation, we train the DMC generator to approximate flow using a reconstruction loss and an adversarial loss, jointly with the downstream action classification task. Extensive evaluations on three action recognition benchmarks (HMDB-51, UCF-101, and a subset of Kinetics) confirm the effectiveness of our method. Our full system, consisting of the generator and the classifier, is coined as DMC-Net which obtains high accuracy close to that of using flow and runs two orders of magnitude faster than using optical flow at inference time.

show abstract

Section: Motion Representation and Optical Flow Estimationmentioning

confidence: 99%

“…First, we minimize the per-pixel difference between the generated DMC and its corresponding optical flow. Following Im2Flow [13] which approximates flow from a single RGB image, we use the Mean Square Error (MSE) reconstruction loss L mse defined as:…”

Section: Optical Flow Reconstruction Lossmentioning

confidence: 99%

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

Shou

Lin

Kalantidis

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

139

View full text Add to dashboard Cite

show abstract

“…Some methods represent motion information only by RGB frame [13,14]. (Gao, Ruohan.et al, 2018) [15]considered that a static image can produce fake motion, thus predict optical flow fields through pre-trained Im2Flow network. [16]used to hallucinate optical flow images from videos.…”

Section: Related Workmentioning

confidence: 99%

Bypass Enhancement RGB Stream Model for Pedestrian Action Recognition of Autonomous Vehicles

Cao

2020

Communications in Computer and Information Science

View full text Add to dashboard Cite

Pedestrian action recognition and intention prediction is one of the core issues in the field of autonomous driving. In this research field, action recognition is one of the key technologies. A large number of scholars have done a lot of work to improve the accuracy of the algorithm for the task. However, there are relatively few studies and improvements in the computational complexity of algorithms and system real-time. In the autonomous driving application scenario, the real-time performance and ultra-low latency of the algorithm are extremely important evaluation indicators, which are directly related to the availability and safety of the autonomous driving system. To this end, we construct a bypass enhanced RGB flow model, which combines the previous two-branch algorithm to extract RGB feature information and optical flow feature information respectively. In the training phase, the two branches are merged by distillation method, and the bypass enhancement is combined in the inference phase to ensure accuracy. The real-time behavior of the behavior recognition algorithm is significantly improved on the premise that the accuracy does not decrease. Experiments confirm the superiority and effectiveness of our algorithm.

show abstract

“…Optical flow prediction from a single image has been studied with various approaches. Supervised approaches using CNNs have also been proposed [Gao et al 2017;Walker et al 2015]. The point is how to prepare ground-truth flow fields for supervised learning.…”

Section: Optical Flow Predictionmentioning

confidence: 99%

“…Training. A straightforward way for training the motion predictor is to minimize the difference between inferred and ground-truth flow fields, as done in [Gao et al 2017;Li et al 2018;Walker et al 2015]. Our motion predictor, in contrast, learns future flow fields in a self-supervised manner only from time-lapse videos that have no ground-truth.…”

Section: Motion Predictormentioning

confidence: 99%

Animating landscape

2019

View full text Add to dashboard Cite

Fig. 1. Given a single scenery image, our method predicts the motion (e.g., moving clouds) and appearance (e.g., time-varying colors) separately to generate a cyclic animation via self-supervised learning of time-lapse videos using our convolutional neural networks that infer backward flow fields (insets) and color transfer functions for converting the input image. The flow fields are visualized using the colormap shown in Figures 8 and 9. The output frame size is 1, 024 × 576. Please see the supplemental video for the resultant animations. Input photo: Pixabay/Pexel.com.Automatic generation of a high-quality video from a single image remains a challenging task despite the recent advances in deep generative models. This paper proposes a method that can create a high-resolution, long-term animation using convolutional neural networks (CNNs) from a single landscape image where we mainly focus on skies and waters. Our key observation is that the motion (e.g., moving clouds) and appearance (e.g., time-varying colors in the sky) in natural scenes have different time scales. We thus learn them separately and predict them with decoupled control while handling future uncertainty in both predictions by introducing latent codes. Unlike previous methods that infer output frames directly, our CNNs predict spatially-smooth intermediate data, i.e., for motion, flow fields for warping, and for appearance, color transfer maps, via self-supervised learning, i.e., without explicitly-provided ground truth. These intermediate data are applied not to each previous output frame, but to the input image only once for each output frame. This design is crucial to alleviate error accumulation in long-term predictions, which is the essential problem in previous recurrent approaches. The output frames can be looped like cinemagraph, and also be controlled directly by specifying latent codes or indirectly via visual annotations. We demonstrate the effectiveness of our method through comparisons with the state-of-the-arts on video prediction as well as appearance manipulation. Resultant videos, codes, and datasets will be available at

show abstract

Im2Flow: Motion Hallucination from Static Images for Action Recognition

Cited by 83 publications

References 77 publications

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

Bypass Enhancement RGB Stream Model for Pedestrian Action Recognition of Autonomous Vehicles

Animating landscape

Contact Info

Product

Resources

About