Efficiency of gradient propagation in intermediate layers of convolutional neural networks is of key importance for superresolution task. To this end, we propose a deep architecture for single image super-resolution (SISR), which is built using efficient convolutional units we refer to as mixed-dense connection blocks (MDCB). The design of MDCB combines the strengths of both residual and dense connection strategies, while overcoming their limitations. To enable super-resolution for multiple factors, we propose a scale-recurrent framework which reutilizes the filters learnt for lower scale factors recursively for higher factors. This leads to improved performance and promotes parametric efficiency for higher factors. We train two versions of our network to enhance complementary image qualities using different loss configurations. We further employ our network for video super-resolution task, where our network learns to aggregate information from multiple frames and maintain spatio-temporal consistency. The proposed networks lead to qualitative and quantitative improvements over state-of-the-art techniques on image and video super-resolution benchmarks.
IntroductionSingle image super-resolution (SISR) aims to estimate a high-resolution (HR) image from a low-resolution (LR) input image, and is an ill-posed problem. Due to its diverse applicability starting from surveillance to medical diagnosis, and from remote sensing to HDTV, the SISR problem has gathered substantial attention from computer vision and image processing community. The ill-posed nature of the problem is generally addressed by learning a LR-HR mapping function in a constrained environment using example HR-LR patch pairs.One way is to learn a mapping function that linearly correlates the HR-LR patch pairs. Such linear functions can be easily learned with few example images as has been practiced by some SR approaches [1,2,3]. But, linear mapping between such patch pairs may not be representative enough to learn different complex structures present in the image. The mapping function would benefit from learning non-linear relationships between HR-LR patch pairs. Recent convolutional neural network (CNN) based models are quite efficient for such a purpose, and can be useful in extracting relevant features by making deeper models. However, deeper models often face vanishing/exploding gradient issues, which can be partially mitigated by using residual mapping [4,5]. Deep residual models has been employed for higher level vision tasks, where batch normalization is generally used for a useful class-specific normalized representation. However, such representation is not much useful in low-level vision task such as SR [6]. Most deep CNN based SR models do not make full use of the hierarchical features from the original LR images. Thus, the scope of improvement in performance is there in effective employment of the hierarchical features from all the convolutional layers, as has been employed by a residual dense network [7] using a sequence of residual dense block...