As a branch of target recognition, surface target recognition plays an irreplaceable role in both military and civilian applications. However, the large target size variation, low image resolution, and high real-time requirements pose challenges to existing algorithms. To address the issues, we take YOLOv5 as a backbone and adopt coordinate attention and a double-layer cascade structure to enhance both the recognition performance and speed. Specifically, coordinate attention is introduced to guide the corresponding network to focus on discriminative features by capturing channel and location information. Meanwhile, the double-layer cascade structure is designed for finely extracting and aggregating semantic features and spatial features at different scales. We test the model on the COCO dataset, the VOC dataset, and self-built surface target dataset. Experimental results show that proposed coordinate attention module and multiscale module improve the recognition effect of multiscale surface targets and meet the requirement of real time.