Objective: To design a deep learning model based on multimodal magnetic resonance image (MRI) sequences for automatic parotid neoplasm classification, and to improve the diagnostic decision-making in clinical settings.Methods: First, multimodal MRI sequences were collected from 266 patients with parotid neoplasms, and an artificial intelligence (AI)-based deep learning model was designed from scratch, combining the image classification network of Resnet and the Transformer network of Natural language processing. Second, the effectiveness of the deep learning model was improved through the multi-modality fusion of MRI sequences, and the fusion strategy of various MRI sequences was optimized. In addition, we compared the effectiveness of the model in the parotid neoplasm classification with experienced radiologists.Results: The deep learning model delivered reliable outcomes in differentiating benign and malignant parotid neoplasms. The model, which was trained by the fusion of T2-weighted, postcontrast T1-weighted, and diffusion-weighted imaging (b = 1000 s/mm 2 ), produced the best result, with an accuracy score of 0.85, an area under the receiver operator characteristic (ROC) curve of 0.96, a sensitivity score of 0.90, and a specificity score of 0.84. In addition, the multi-modal paradigm exhibited reliable outcomes in diagnosing the pleomorphic adenoma and the Warthin tumor, but not in the identification of the basal cell adenoma.Conclusion: An accurate and efficient AI based classification model was produced to classify parotid neoplasms, resulting from the fusion of multimodal MRI sequences. The effectiveness certainly outperformed the model with single MRI images or single MRI sequences as input, and potentially, experienced radiologists.