Detection and counting of wheat heads are crucial for wheat yield estimation. To address the issues of overlapping and small volumes of wheat heads on complex backgrounds, this paper proposes the YOLOv7-MA model. By introducing micro-scale detection layers and the convolutional block attention module, the model enhances the target information of wheat heads and weakens the background information, thereby strengthening its ability to detect small wheat heads and improving the detection performance. Experimental results indicate that after being trained and tested on the Global Wheat Head Dataset 2021, the YOLOv7-MA model achieves a mean average precision (MAP) of 93.86% with a detection speed of 35.93 frames per second (FPS), outperforming Faster-RCNN, YOLOv5, YOLOX, and YOLOv7 models. Meanwhile, when tested under the three conditions of low illumination, blur, and occlusion, the coefficient of determination (R2) of YOLOv7-MA is respectively 0.9895, 0.9872, and 0.9882, and the correlation between the predicted wheat head number and the manual counting result is stronger than others. In addition, when the YOLOv7-MA model is transferred to field-collected wheat head datasets, it maintains high performance with MAP in maturity and filling stages of 93.33% and 93.03%, respectively, and R2 values of 0.9632 and 0.9155, respectively, demonstrating better performance in the maturity stage. Overall, YOLOv7-MA has achieved accurate identification and counting of wheat heads in complex field backgrounds. In the future, its application with unmanned aerial vehicles (UAVs) can provide technical support for large-scale wheat yield estimation in the field.