Multi-Frame Pyramid Refinement Network for Video Frame Interpolation

Zhang, Haoxian; Wang, Ronggang; Zhao, Yang

doi:10.1109/access.2019.2940510

Cited by 18 publications

(8 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most state-of-the-art optical flow models use deep learning [25], which suggest that CNNs can understand motion information between frames. In order to get better results, many researchers merge optical flow estimation and video interpolation frame in a single model [2], [6], [7], [26]- [28]. Liu et al [6] designs a deep network with a voxel flow layer to synthesize video frames by flowing pixel values from input video volume.…”

Section: Related Workmentioning

confidence: 99%

“…Niklaus and Liu [27] apply pixel-wise contextual information extracted by a pre-trained network to estimated bidirectional flow, and uses a frame synthesis network to produce the interpolated frame in a context-aware fashion. Zhang et al [7] uses a 3D U-Net feature extractor to excavate spatio-temporal context and rebuild texture, and a coarse-to-fine architecture to improve optical flows estimation. Li et al [28] proposes a lightweight network to estimate optical flow at feature level and introduce a new sobolev loss achieve better results.…”

Section: Related Workmentioning

confidence: 99%

“…Most existing video interpolation methods usually use convolutional neural networks (CNN). These methods can be divided into two categories: based on interpolation convolution kernels estimation [4], [5] and based on optical flow estimation [2], [6], [7]. The first technique combines motion estimation and pixel synthesis into a single process, which produces a convolution kernel for each pixel.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multi-Scale Attention Generative Adversarial Networks for Video Frame Interpolation

Xiao

2020

IEEE Access

View full text Add to dashboard Cite

Video frame interpolation is a fundamental task in computer vision. Recent methods usually apply convolutional neural networks to generate intermediate frame with two consecutive frames as inputs. But sometimes existing methods fail to handle with complex motion and long-range dependencies. In this paper, a multi-scale dense attention generative adversarial network is proposed. First, a multi-scale generative adversarial framework is established for video frame interpolation. Generators from coarse to fine can better combine global and local information. Second, an attention module introduced to generator makes network accurately focus on moving objects. Third, a sequence discriminator is designed to improve the ability of capturing spatial and temporal consistency in frame sequence. Experimental results of ablation study prove the effectiveness of our three contributions. And results on several datasets demonstrate that our approach attains higher performance and produce more photo-realistic in-between frame comparing with previous works. INDEX TERMSVideo frame interpolation, generative adversarial networks, multi-scale pyramid, spatial and temporal consistency, sequence discriminator. JIAN XIAO received the B.S. degree in communication engineering from Northeast Electric Power University, China, in 2015. He is currently pursuing the Ph.D. degree in information and communication engineering with Harbin Engineering University. His research interests include image processing, generative adversarial networks, video analysis, and computer vision. XIAOJUN BI received the Ph.D. degree from the

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-Scale Attention Generative Adversarial Networks for Video Frame Interpolation

Xiao

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…However, objects often follow complex, non-linear trajectories. To this end, researchers recently focused on leveraging information from more than two neighboring frames [7,8,26,50,53].…”

Section: Introductionmentioning

confidence: 99%

Non-linear Motion Estimation for Video Frame Interpolation using Space-time Convolutions

Dutta¹,

Subramaniam²,

Mittal³

2022

Preprint

View full text Add to dashboard Cite

Video frame interpolation aims to synthesize one or multiple frames between two consecutive frames in a video. It has a wide range of applications including slow-motion video generation, frame-rate up-scaling and developing video codecs. Some older works tackled this problem by assuming per-pixel linear motion between video frames. However, objects often follow a non-linear motion pattern in the real domain and some recent methods attempt to model per-pixel motion by non-linear models (e.g., quadratic). A quadratic model can also be inaccurate, especially in the case of motion discontinuities over time (i.e. sudden jerks) and occlusions, where some of the flow information may be invalid or inaccurate.In our paper, we propose to approximate the per-pixel motion using a space-time convolution network that is able to adaptively select the motion model to be used. Specifically, we are able to softly switch between a linear and a quadratic model. Towards this end, we use an end-toend 3D CNN encoder-decoder architecture over bidirectional optical flows and occlusion maps to estimate the nonlinear motion model of each pixel. Further, a motion refinement module is employed to refine the non-linear motion and the interpolated frames are estimated by a simple warping of the neighboring frames with the estimated perpixel motion. Through a set of comprehensive experiments, we validate the effectiveness of our model and show that our method outperforms state-of-the-art algorithms on four datasets (Vimeo, DAVIS, HD and GoPro).

show abstract

“…Many recently published papers focus on addressing the motion analysis. For example, FI-NET [2] computes optical flow at feature level instead of image level to make the motion estimation more accurate; [3] learns the latent motion features instead of learning the optical flow as the motion feature; [4] and [5] learn from 4 input images instead of 2 images and add some techniques like long short term memory (LSTM) and Multi-Frame Pyramid Refinement to predict the motions.…”

Section: Introductionmentioning

confidence: 99%

DRVI: Dual Refinement for Video Interpolation

Zhou

Basu

2021

IEEE Access

View full text Add to dashboard Cite

The quality of a video clip is considered to be poor if the resolution or the frame rate is low. Video interpolation is thus introduced to enhance video quality and provide a better viewing experience to users. However, there are still some challenges, like the blur caused by motion changes. In this paper, we introduce a dual refinement technique for video interpolation (DRVI). It has three main steps, namely flow refinement, frame synthesis, and Haar refinement. The flow refinement can generate accurate bidirectional flows, which are more suitable for frame interpolation tasks. The Haar refinement uses the Discrete Wavelet Transform (DWT). It can preserve information in different frequency domains and also speed up the learning process. We also add an arbitrary time approximation module to allow multi-frame generation. The number of learnable parameters in our model is much less than existing methods; still, it has excellent performance. Our method is trained on Vimeo90K [1] and tested on three well-known datasets to demonstrate its effectiveness.

show abstract

Multi-Frame Pyramid Refinement Network for Video Frame Interpolation

Cited by 18 publications

References 44 publications

Multi-Scale Attention Generative Adversarial Networks for Video Frame Interpolation

Multi-Scale Attention Generative Adversarial Networks for Video Frame Interpolation

Non-linear Motion Estimation for Video Frame Interpolation using Space-time Convolutions

DRVI: Dual Refinement for Video Interpolation

Contact Info

Product

Resources

About