A forest wildlife detection algorithm based on an improved YOLOv5s network model is proposed to advance forest wildlife monitoring and improve detection accuracy in complex forest environments. This research utilizes a data set from the Hunan Hupingshan National Nature Reserve in China, to which data augmentation and expansion methods are applied to extensively train the proposed model. To enhance the feature extraction ability of the proposed model, a weighted channel stitching method based on channel attention is introduced. The Swin Transformer module is combined with a CNN network to add a Self-Attention mechanism, thus improving the perceptual field for feature extraction. Furthermore, a new loss function (DIOU_Loss) and an adaptive class suppression loss (L_BCE) are adopted to accelerate the model’s convergence speed, reduce false detections in confusing categories, and increase its accuracy. When comparing our improved algorithm with the original YOLOv5s network model under the same experimental conditions and data set, significant improvements are observed, in particular, the mean average precision (mAP) is increased from 72.6% to 89.4%, comprising an accuracy improvement of 16.8%. Our improved algorithm also outperforms popular target detection algorithms, including YOLOv5s, YOLOv3, RetinaNet, and Faster-RCNN. Our proposed improvement measures can well address the challenges posed by the low contrast between background and targets, as well as occlusion and overlap, in forest wildlife images captured by trap cameras. These measures provide practical solutions for enhanced forest wildlife protection and facilitate efficient data acquisition.