Deep learning methods have been widely applied to seismic denoising, outperforming existing methods in efficiency and generalization. For internal multiple suppression, however, deep learning models face the challenge of low generalization owing to the variability of internal multiples. This diminishes the advantages of deep learning methods over current processing flows. To overcome this, we redesign a convolutional neural network (CNN) method in the aspect of label generation and training process to suppress internal multiples. We apply the virtual event (VE) method to a small amount of data and take removed internal multiple data as labels to accelerate the network training, which is multiple learning. Instead of training a pervasive model to apply to all datasets, we rely on transfer learning to generalize. We finetune the synthetic data training model on target datasets to obtain the model applicable to the dataset with a low requirement of training data and time. Tests on synthetic and field data demonstrate the effects of multiple learning and transfer learning, as well as the competitive demultipling performance of our method compared with both the VE method and the original CNN in efficiency and primary-preserving ability.