Single Image Super-Resolution Author: Mikuláš Bankovič Faculty of Informatics, Masaryk University October 22, 2020 Introduction Introduction Motivation Why Super-Resolution (SR)? Link Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 2 / 24 Introduction Introduction Motivation Why SR? Link Games Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 2 / 24 Introduction Introduction Motivation Why SR? Link Games Medical imaging Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 2 / 24 Introduction Introduction Motivation Why SR? Link Games Medical imaging Photography details Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 2 / 24 Introduction Introduction Motivation Why SR? Link Games Medical imaging Photography details Astronomy Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 2 / 24 Introduction Introduction Motivation Why SR? Link Games Medical imaging Photography details Astronomy Face and character recognition Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 2 / 24 Introduction Introduction Motivation Why SR? Link Games Medical imaging Photography details Astronomy Face and character recognition Project video699[3] Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 2 / 24 Introduction Introduction Motivation Why SR? Link Games Medical imaging Photography details Astronomy Face and character recognition Project video699[3] Super-scaling of FFFI movies Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 2 / 24 Introduction Metrics The Peek Signal Noise Ratio (PSNR) (in dB) is defined as following: PSNR = 10 × log10 MAX2 I MSE , where MAXI is the maximum possible pixel value of the image. When the pixels are represented using 8 bits per sample, this is 255. Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 3 / 24 Introduction Metrics The PSNR (in dB) is defined as following: PSNR = 10 × log10 MAX2 I MSE , where MAXI is the maximum possible pixel value of the image. When the pixels are represented using 8 bits per sample, this is 255. SSIM - weighted combination of luminance, contrast and structure: SSIM(x, y) = l(x, y)α · c(x, y)β · s(x, y)γ Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 3 / 24 Introduction Metrics The PSNR (in dB) is defined as following: PSNR = 10 × log10 MAX2 I MSE , where MAXI is the maximum possible pixel value of the image. When the pixels are represented using 8 bits per sample, this is 255. SSIM - weighted combination of luminance, contrast and structure: MOS - mostly human opinions on 5 number scale Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 3 / 24 Introduction History of SR Bilinear and bicubic interpolation: no prior knowledge about images no way to fine-tune to specific dataset does not improve with more data Sparse-coding-based methods: The methods are part of example-based learning methods. They consist of a multiple-step pipeline: 1. Crop overlapping patches and preprocess them (substract mean and normalize) 2. Encode these patches by Low-Resolution (LR) dictionary 3. Encoded coefficients are passed to the High-Resolution (HR) dictionary 4. Overlapping HR patches are aggregated Focus on optimizing and improving dictionaries with mapping, while disregarding other steps. They often have to solve optimization problems on inference. Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 4 / 24 SRCNN Super-Resolution Convolutional Neural Network (SRCNN) Given by Dong et al. [1], the Convolutional Neural Network (CNN) is equivalent to the previous pipeline. That brings multiple advantages: The inference consists only from feed-forward pass. The pipeline is unified, therefore, each step is optimized during training. The dictionaries are not explicitly formed, but included in the weights. Provides superior quality and speed performance (Next slides). Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 5 / 24 SRCNN SRCNN SRCNN is a simple CNN with three convolutional layers. The input image is firstly upscaled using bicubic interpolation. The next step is feed-forward pass through the network. The different architecture involved changes in filter size: 9-1-5, 9-5-5, 11-5-7, etc. Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 6 / 24 SRCNN SRCNN Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 6 / 24 SRCNN SRCNN Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 6 / 24 SRCNN SRCNN Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 6 / 24 SRCNN SRCNN Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 6 / 24 SRCNN SRCNN Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 6 / 24 SRCNN Problems The bicubic interpolation is an expensive operation that often introduce side-effects as blurring or noise amplification. More data could overfit the network, because its smaller size. Most of the operations are performed in an expensive HR space. Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 7 / 24 FSRCNN FSRCNN The main differences between SRCNN and Fast Super-Resolution Convolutional Neural Network (FSRCNN): There is no pre-processing or upsampling at the beginning. The feature extraction took place in the LR space. A 1 × 1 convolution is used after the initial 5 × 5 convolution to reduce the number of channels, and hence lesser computation and memory, similar to how the Inception[4] network is developed. Multiple 3 × 3 convolutions are used, instead of having a big convolutional filter, similar to how the VGG network works by simplifying the architecture to reduce the number of parameters. Upsampling is done by using a learnt transposed convolution, thus improving the model. Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 8 / 24 FSRCNN FSRCNN 17.36 times faster than SRCNN and can run in real time (24 fps) with a generic CPU. All layers except from the last can be shared with multiple upscaling factors. Transposed convolution LINK[2] Transposed convolution with stride LINK[2] Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 9 / 24 FSRCNN FSRCNN Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 10 / 24 FSRCNN FSRCNN Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 10 / 24 FSRCNN FSRCNN Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 10 / 24 ESPCN ESPCN Efficient Sub-Pixel Convolutional Neural network (ESPCN) introduces the concept of sub-pixel convolution to replace the transposed convolution layer for upsampling. This solves two problems associated with it: 1. Transposed convolution happens in the high resolution space, and thus is more computationally expensive. 2. It resolves the checkerboard issue in deconvolution, which occurs due to the overlap operation of convolution. Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 11 / 24 ESPCN Sub-pixel convolutional layers In the recent literature they are called pixel-shuffle or depth-to-space layers. Pixels from multiple channels in a low resolution image are rearranged to a single channel in a high resolution image. Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 12 / 24 ESPCN Sub-pixel convolutional layers In the recent literature they are called pixel-shuffle or depth-to-space layers. Pixels from multiple channels in a low resolution image are rearranged to a single channel in a high resolution image. Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 12 / 24 ESPCN ESPCN Figure: Lancsoz (left) vs ESPCN (right) Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 13 / 24 SRGAN Super-Resolution Generative Adversarial Network (SRGAN) Problem: All previous methods were train on MSE loss function. However, ideal MSE image is not offhand the most photo-realistic Solution: Generative Adversarial Network (GAN) Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 14 / 24 SRGAN SRGAN Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 14 / 24 SRGAN SRGAN Problem: All previous methods were train on MSE loss function. However, ideal MSE image is not offhand the most photo-realistic Solution: GAN Problem: As in other computer vision disciplines, deeper models are more successful, however, harder to train due to some aspects, such as vanishing gradient problem. Solution: ResNet Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 14 / 24 SRGAN ResNet Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 15 / 24 SRGAN GAN Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 16 / 24 SRGAN SRGAN This paper generates State Of The Art (SOTA) results on upsampling (4x) as measured by PSNR and SSIM with 16 block deep SRResNet network optimized for MSE. The authors proposed a new SRGAN in which the authors replace the MSE based content loss with the loss calculated on VGG layers. SRGAN was able to generate SOTA results which the author validated with extensive MOS test on three public benchmark datasets. Use 2 losses for generator network: MSE and function based on the euclidean distance between feature maps extracted from the VGG19 network. Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 17 / 24 SRGAN Loss Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 18 / 24 SRGAN Loss Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 18 / 24 SRGAN Loss Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 18 / 24 SRGAN Loss Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 18 / 24 SRGAN Problems Expensive training A little bit less expensive inference Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 19 / 24 SRGAN Future Residual networks? Attention-based networks? GANs? Progressive Reconstruction Networks? Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 20 / 24 SRGAN Future Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 21 / 24 Thank You for Your Attention! Bibliography Bibliography I [1] Chao Dong et al. “Image Super-Resolution Using Deep Convolutional Networks”. In: (). URL: http://arxiv.org/abs/1501.00092 (visited on 10/05/2020). [2] Vincent Dumoulin and Francesco Visin. A guide to convolution arithmetic for deep learning. 2018. URL: https://arxiv.org/abs/1603.07285v2 (visited on 10/20/2020). [3] Vít Novotný. video699. Automatic alignment of lecture recordings with study materials. 2018. URL: https://github.com/video699 (visited on 01/10/2020). Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 23 / 24 Bibliography Bibliography II [4] Christian Szegedy et al. “Rethinking the Inception Architecture for Computer Vision”. In: (2015). URL: http://arxiv.org/abs/1512.00567 (visited on 10/20/2020). Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 24 / 24 Acronyms CNN Convolutional Neural Network. 14–20 ESPCN Efficient Sub-Pixel Convolutional Neural network. 27, 30 FSRCNN Fast Super-Resolution Convolutional Neural Network. 22–26 GAN Generative Adversarial Network. 31–33 HR High-Resolution. 13, 21 LR Low-Resolution. 13, 22 MOS Mean Opinion Score. 10–12, 36 PSNR Peek Signal Noise Ratio. 10–12, 36 SOTA State Of The Art. 36 SR Super-Resolution. 2–9, 13 Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 24 / 24 Bibliography SRCNN Super-Resolution Convolutional Neural Network. 14–20, 22, 23 SRGAN Super-Resolution Generative Adversarial Network. 31–33, 36 SSIM Structural Similarity Index Measure. 10–12, 36 Mikuláš Bankovič ·Single Image Super-Resolution ·October 22, 2020 24 / 24