Wildlife is an important part of natural ecosystems and protecting wildlife plays a crucial role in maintaining ecological balance. The wildlife detection method for images and videos based on deep learning can save a lot of labor costs and is of great significance and value for the monitoring and protection of wildlife. However, the complex and changing outdoor environment often leads to less than satisfactory detection results due to insufficient lighting, mutual occlusion, and blurriness. The TMS-YOLO (Takin, Monkey, and Snow Leopard-You Only Look Once) proposed in this paper is a modification of YOLOv7, specifically optimized for wildlife detection. It uses the designed O-ELAN (Optimized Efficient Layer Aggregation Networks) and O-SPPCSPC (Optimized Spatial Pyramid Pooling Combined with Cross Stage Partial Channel) modules and incorporates the CBAM (Convolutional Block Attention Module) to enhance its suitability for this task. In simple terms, O-ELAN can preserve a portion of the original features through residual structures when extracting image features, resulting in more background and animal features. However, O-ELAN may include more background information in the extracted features. Therefore, we use CBAM after the backbone to suppress background features and enhance animal features. Then, when fusing the features, we use O-SPPCSPC with fewer network layers to avoid overfitting. Comparative experiments were conducted on a self-built dataset and a Turkish wildlife dataset. The results demonstrated that the enhanced TMS-YOLO models outperformed YOLOv7 on both datasets. The mAP (mean Average Precision) of YOLOv7 on the two datasets was 90.5% and 94.6%, respectively. In contrast, the mAP of TMS-YOLO in the two datasets was 93.4% and 95%, respectively. These findings indicate that TMS-YOLO can achieve more accurate wildlife detection compared to YOLOv7.