The traditional medical image fusion methods, such as the famous multi-scale decomposition-based methods, usually suffer from the bad sparse representations of the salient features and the low ability of the fusion rules to transfer the captured feature information. In order to deal with this problem, a medical image fusion method based on the scale invariant feature transformation (SIFT) descriptor and the deep convolutional neural network (CNN) in the shift-invariant shearlet transform (SIST) domain is proposed. Firstly, the images to be fused are decomposed into the high-pass and the low-pass coefficients. Then, the fusion of the high-pass components is implemented under the rule based on the pre-trained CNN model, which mainly consists of four steps: feature detection, initial segmentation, consistency verification, and the final fusion; the fusion of the low-pass subbands is based on the matching degree computed by the SIFT descriptor to capture the features of the low frequency components. Finally, the fusion results are obtained by inversion of the SIST. Taking the typical standard deviation, QAB/F, entropy, and mutual information as the objective measurements, the experimental results demonstrate that the detailed information without artifacts and distortions can be well preserved by the proposed method, and better quantitative performance can be also obtained.