Zehan Wang scite author profile

Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image superresolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4× upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method. arXiv:1609.04802v5 [cs.CV]

show abstract

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

Shi¹,

Caballero²,

Huszár³

et al. 2016

5,632

3,058

View full text Add to dashboard Cite

Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.

show abstract

Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation

et al. 2017

View full text Add to dashboard Cite

Convolutional neural networks have enabled accurate image super-resolution in real-time. However, recent attempts to benefit from temporal correlations in video superresolution have been limited to naive or inefficient architectures. In this paper, we introduce spatio-temporal subpixel convolution networks that effectively exploit temporal redundancies and improve reconstruction accuracy while maintaining real-time speed. Specifically, we discuss the use of early fusion, slow fusion and 3D convolutions for the joint processing of multiple consecutive video frames. We also propose a novel joint motion compensation and video super-resolution algorithm that is orders of magnitude more efficient than competing methods, relying on a fast multi-resolution spatial transformer module that is endto-end trainable. These contributions provide both higher accuracy and temporally more consistent videos, which we confirm qualitatively and quantitatively. Relative to singleframe models, spatio-temporal networks can either reduce the computational cost by 30% whilst maintaining the same quality or provide a 0.2dB gain for a similar computational cost. Results on publicly available datasets demonstrate that the proposed algorithms surpass current state-of-theart performance in both accuracy and efficiency.

show abstract

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Ledig¹,

Theis²,

Huszár³

et al. 2016

Preprint

347

396

View full text Add to dashboard Cite

Discriminative dictionary learning for abdominal multi-organ segmentation

Tong

Wolz

Wang

et al. 2015

Medical Image Analysis

145

105

View full text Add to dashboard Cite

An automated segmentation method is presented for multi-organ segmentation in abdominal CT images. Dictionary learning and sparse coding techniques are used in the proposed method to generate target specific priors for segmentation. The method simultaneously learns dictionaries which have reconstructive power and classifiers which have discriminative ability from a set of selected atlases. Based on the learnt dictionaries and classifiers, probabilistic atlases are then generated to provide priors for the segmentation of unseen target images. The final segmentation is obtained by applying a post-processing step based on a graph-cuts method. In addition, this paper proposes a voxel-wise local atlas selection strategy to deal with high inter-subject variation in abdominal CT images. The segmentation performance of the proposed method with different atlas selection strategies are also compared. Our proposed method has been evaluated on a database of 150 abdominal CT images and achieves a promising segmentation performance with Dice overlap values of 94.9%, 93.6%, 71.1%, and 92.5% for liver, kidneys, pancreas, and spleen, respectively.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zehan Wang

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Discriminative dictionary learning for abdominal multi-organ segmentation

Contact Info

Product

Resources

About