Video frame interpolation is a fundamental task in computer vision. Recent methods usually apply convolutional neural networks to generate intermediate frame with two consecutive frames as inputs. But sometimes existing methods fail to handle with complex motion and long-range dependencies. In this paper, a multi-scale dense attention generative adversarial network is proposed. First, a multi-scale generative adversarial framework is established for video frame interpolation. Generators from coarse to fine can better combine global and local information. Second, an attention module introduced to generator makes network accurately focus on moving objects. Third, a sequence discriminator is designed to improve the ability of capturing spatial and temporal consistency in frame sequence. Experimental results of ablation study prove the effectiveness of our three contributions. And results on several datasets demonstrate that our approach attains higher performance and produce more photo-realistic in-between frame comparing with previous works.
INDEX TERMSVideo frame interpolation, generative adversarial networks, multi-scale pyramid, spatial and temporal consistency, sequence discriminator. JIAN XIAO received the B.S. degree in communication engineering from Northeast Electric Power University, China, in 2015. He is currently pursuing the Ph.D. degree in information and communication engineering with Harbin Engineering University. His research interests include image processing, generative adversarial networks, video analysis, and computer vision. XIAOJUN BI received the Ph.D. degree from the