Rapid and precise detection and classification of vehicles are vital for the intelligent transportation systems (ITSs). However, due to small gaps between vehicles on the road and interference features of photos, or video frames, containing vehicle images, it is difficult to detect and identify vehicle types quickly and precisely. For solving this problem, a new vehicle detection and classification model, named YOLOv4_AF, is proposed in this paper, based on an optimization of the YOLOv4 model. In the proposed model, an attention mechanism is utilized to suppress the interference features of photos through both the channel dimension and spatial dimension. In addition, a modification on the Feature Pyramid Network (FPN) part of the Path Aggregation Network (PAN), utilized by YOLOv4, is applied in order to enhance further the effective features through down-sampling. This way, the objects can be steadily positioned in the 3D space and the object detection and classification performance of the model can be improved. The results, obtained through experiments conducted on two public data sets, demonstrate that the proposed YOLOv4_AF model outperforms, in this regard, both the original YOLOv4 model and two other state-of-the-art models, Faster R-CNN and EfficientDet, in terms of the mean average precision (mAP) and F1 score, by achieving respective values of 83.45% and 0.816 on the BIT-Vehicle data set, and 77.08% and 0.808 on the UA-DETRAC data set.