The complex and changing environment in the field brings a more significant detection challenge to camera surveillance. This paper proposes target detection in camera surveillance images using the improved YOLOv5s algorithm for target tracking in surveillance images. Firstly, to address the weak feature extraction ability of YOLOv5s for small-scale and overlapping targets, the feature extraction performance of YOLOv5s is improved by combining the attention mechanism and substitution loss function in the deep learning network to improve the feature extraction performance of YOLOv5s for target detection in field camera surveillance images. Then, to test the improved algorithm’s performance, the improved algorithm in this paper is compared with SSD, Faster R-CNN, and YOLOv5s detection methods, and performance comparison experiments are done on the basis of the dataset. The results show that the average detection accuracy (MAP) of the algorithm in this paper is improved by 19%, 14.5% and 6.3% than SSD, Faster R-CNN, and YOLOv5s, respectively, and the average speed of detection is 324 FPS, the accuracy is improved, the detection is faster, and for the YOLOv5 m with DA and PT has a higher AP than the other models in this paper. This study enhances the scalability of YOLOv5s algorithm in complex environments, which is crucial for advancing image target detection.