This paper presents a review of deep learning (DL) based medical image registration methods. We summarized the latest developments and applications of DL-based registration methods in the medical field. These methods were classified into seven categories according to their methods, functions and popularity. A detailed review of each category was presented, highlighting important contributions and identifying specific challenges. A short assessment was presented following the detailed review of each category to summarize its achievements and future potentials. We provided a comprehensive comparison among DL-based methods for lung and brain registration using benchmark datasets. Lastly, we analyzed the statistics of all the cited works from various aspects, revealing the popularity and future trend of DL-based medical image registration. 1 arXiv:1912.12318v1 [eess.IV] 27 Dec 2019 1. Summarize the latest developments in DL-based medical image registration. 2. Highlight contributions, identify challenges and outline future trends. 3. Provide detailed statistics on recent publications from different perspectives. 2 Deep Learning 2
Convolutional Neural NetworkConvolutional neural network (CNN) is a class of deep neural networks with regularized multilayer perceptron. CNN uses convolution operation in place of general matrix multiplication in simple neural networks. The convolutional filters and operations in CNN make it suitable for visual imagery signal processing. Because of its excellent feature extraction ability, CNN is one of the most successful models for image analysis. Since the breakthrough of AlexNet [79], many variants of CNN have been proposed and have achieved the-state-of-art performances in various image processing tasks. A typical CNN usually consists of multiple convolutional layers, max pooling layers, batch normalization layers, dropout layers, a sigmoid or softmax layer. In each convolutional layer, multiple channels of feature maps were extracted by sliding trainable convolutional kernels across the input feature maps. Hierarchical features with high-level abstraction are extracted using multiple convolutional layers. These feature maps usually go through multiple fully connected layer before reaching the final decision layer. Max pooling layers are often used to reduce the image sizes and to promote spatial invariance of the network. Batch normalization is used to reduce internal covariate shift among the training samples. Weight regularization and dropout layers are used to alleviate data overfitting. The loss function is defined as the difference between the predicted and the target output. CNN is usually trained by minimizing the loss via gradient back propagation using optimization methods. Many different types of network architectures have been proposed to improve the performance of CNN [93]. U-Net proposed by Ronneberger et al. is among one of the most used network architectures [120]. U-Net was originally used to perform neuronal structures segmentation. U-Net adopts symmetrical contractiv...