With the rising number of vehicle ownership nationwide and the consequent increase in traffic accidents, vehicle detection for traffic surveillance video is an effective method to reduce traffic accidents. However, existing video surveillance vehicle detection methods suffer from high computational load, low accuracy, and excessive reliance on large-scale computing servers. Therefore, the research will try to fuse coordinate attention mechanism to improve YOLOv5 network, choose lightweight YOLOv5s for image recognition, and use K-means algorithm to modify the aiming frame according to the characteristics of vehicle detection; meanwhile, in order to get more accurate results, coordinate attention mechanism algorithm, which is also a lightweight algorithm, is inserted into YOLOv5s for improvement, so that the designed The lightweight vehicle detection model can be run on embedded devices. The measurement experiments show that the YOLOv5+CA model completes convergence when the iterations exceed 100, and the localization loss and confidence loss gradually stabilize at 0.002 and 0.028, and the classification loss gradually stabilizes at 0.017. Comparing YOLOv5+CA with SSD algorithm, ResNet-101 algorithm and RefineDet algorithm, YOLOv5 +CA detection accuracy is better than other algorithms by about 9%, and the accuracy can be approximated to 1.0 at a confidence level of 0.946. The experimental results show that the research design provides higher accuracy and high computational efficiency for video surveillance vehicle detection, and can better provide reference value and reference methods for video surveillance vehicle detection and operation management.