This study is to evaluate the feasibility of deep learning (DL) models in the multiclassification of reflux esophagitis (RE) endoscopic images, according to the Los Angeles (LA) classification for the first time. The images were divided into three groups, namely, normal, LA classification A + B, and LA C + D. The images from the HyperKvasir dataset and Suzhou hospital were divided into the training and validation datasets as a ratio of 4 : 1, while the images from Jintan hospital were the independent test set. The CNNs- or Transformer-architectures models (MobileNet, ResNet, Xception, EfficientNet, ViT, and ConvMixer) were transfer learning via Keras. The visualization of the models was proposed using Gradient-weighted Class Activation Mapping (Grad-CAM). Both in the validation set and the test set, the EfficientNet model showed the best performance as follows: accuracy (0.962 and 0.957), recall for LA A + B (0.970 and 0.925) and LA C + D (0.922 and 0.930), Marco-recall (0.946 and 0.928), Matthew’s correlation coefficient (0.936 and 0.884), and Cohen’s kappa (0.910 and 0.850), which was better than the other models and the endoscopists. According to the EfficientNet model, the Grad-CAM was plotted and highlighted the target lesions on the original images. This study developed a series of DL-based computer vision models with the interpretable Grad-CAM to evaluate the feasibility in the multiclassification of RE endoscopic images. It firstly suggests that DL-based classifiers show promise in the endoscopic diagnosis of esophagitis.