Aiming at the problems of high missed detection rates of the YOLOv7 algorithm for vehicle detection on urban roads, weak perception of small targets in perspective, and insufficient feature extraction, the YOLOv7-RAR recognition algorithm is proposed. The algorithm is improved from the following three directions based on YOLOv7. Firstly, in view of the insufficient nonlinear feature fusion of the original backbone network, the Res3Unit structure is used to reconstruct the backbone network of YOLOv7 to improve the ability of the network model architecture to obtain more nonlinear features. Secondly, in view of the problem that there are many interference backgrounds in urban roads and that the original network is weak in positioning targets such as vehicles, a plug-and-play hybrid attention mechanism module, ACmix, is added after the SPPCSPC layer of the backbone network to enhance the network’s attention to vehicles and reduce the interference of other targets. Finally, aiming at the problem that the receptive field of the original network Narrows, with the deepening of the network model, leads to a high miss rate of small targets, the Gaussian receptive field scheme used in the RFLA (Gaussian-receptive-field-based label assignment) module is used at the connection between the feature fusion area and the detection head to improve the receptive field of the network model for small objects in the image. Combining the three improvement measures, the first letter of the name of each improvement measure is selected, and the improved algorithm is named the YOLOv7-RAR algorithm. Experiments show that on urban roads with crowded vehicles and different weather patterns, the average detection accuracy of the YOLOv7-RAR algorithm reaches 95.1%, which is 2.4% higher than that of the original algorithm; the AP50:90 performance is 12.6% higher than that of the original algorithm. The running speed of the YOLOv7-RAR algorithm reaches 96 FPS, which meets the real-time requirements of vehicle detection; hence, the algorithm can be better applied to vehicle detection.