Accurate classification of moss species is essential for progress in ecology and biology. However, traditional methods for classifying moss require significant expertise, and current deep learning techniques struggle due to limited dataset diversity and poor performance in multi-class classification tasks. To overcome these challenges, we proposed the Swin Routiformer, a new algorithm for moss image classification that enhances the Swin Transformer with bi-level routing attention. Addressing the issue of limited data, we constructed a dataset with images of 110 different moss types. Additionally, we propose the Crop-Similar data augmentation algorithm, specifically designed for moss images, to reduce background noise interference and prevent information loss due to feature scaling. Adopting the Swin Transformer model with its multi-level hierarchical architecture for visual feature extraction, we introduce the Swin Routiformer Block, which enhances the network's feature interaction capabilities, reduces computational complexity, and improves classification accuracy and image processing speed for moss species. Our experimental results show that the Swin Routiformer achieves a top-1 accuracy of 82.19% and an f1-score of 82.79% on the test set, outperforming most mainstream models by 4.53% and 1.81% respectively compared to the baseline Swin Transformer model. These findings establish the Swin Routiformer as a valuable tool for the precise identification of moss species, offering significant contributions to the related fields.