Retinal vein occlusion (RVO) is the second common cause of blindness following diabetic retinopathy. The manual screening of fundus images to detect RVO is time consuming. Deep-learning techniques have been used for screening RVO due to their outstanding performance in many applications. However, unlike other images, medical images have smaller lesions, which require a more elaborate approach. To provide patients with an accurate diagnosis, followed by timely and effective treatment, we developed an intelligent method for automatic RVO screening on fundus images. Swin Transformer learns the hierarchy of low-to high-level features like the convolutional neural network. However, Swin Transformer extracts features from fundus images through attention modules, which pay more attention to the interrelationship between the features and each other. The model is more universal, does not rely entirely on the data itself, and focuses not only on local information but has a diffusion mechanism from local to global. To suppress overfitting, we adopt a regularization strategy, label smoothing, which uses one-hot to add noise to reduce the weight of the categories of true sample labels when calculating the loss function. The choice of different models using a 5-fold cross-validation on our own datasets indicates that Swin Transformer performs better. The accuracy of classifying all datasets is 98.75 ± 0.000, and the accuracy of identifying MRVO, CRVO, BRVO, and normal, using the method proposed in the paper, is 94.49 ± 0.094, 99.98 ± 0.015, 98.88 ± 0.08, and 99.42 ± 0.012, respectively. The method will be useful to diagnose RVO and help decide grade through fundus images, which has the potency to provide patients with further diagnosis and treatment.