IntroductionThe precise detection of weeds in the field is the premise of implementing weed management. However, the similar color, morphology, and occlusion between wheat and weeds pose a challenge to the detection of weeds. In this study, a CSCW-YOLOv7 based on an improved YOLOv7 architecture was proposed to identify five types of weeds in complex wheat fields.MethodsFirst, a dataset was constructed for five weeds that are commonly found, namely, Descurainia sophia, thistle, golden saxifrage, shepherd’s purse herb, and Artemisia argyi. Second, a wheat weed detection model called CSCW-YOLOv7 was proposed to achieve the accurate identification and classification of wheat weeds. In the CSCW-YOLOv7, the CARAFE operator was introduced as an up-sampling algorithm to improve the recognition of small targets. Then, the Squeeze-and-Excitation (SE) network was added to the Extended Latent Attention Networks (ELAN) module in the backbone network and the concatenation layer in the feature fusion module to enhance important weed features and suppress irrelevant features. In addition, the contextual transformer (CoT) module, a transformer-based architectural design, was used to capture global information and enhance self-attention by mining contextual information between neighboring keys. Finally, the Wise Intersection over Union (WIoU) loss function introducing a dynamic nonmonotonic focusing mechanism was employed to better predict the bounding boxes of the occluded weed.Results and discussionThe ablation experiment results showed that the CSCW-YOLOv7 achieved the best performance among the other models. The accuracy, recall, and mean average precision (mAP) values of the CSCW-YOLOv7 were 97.7%, 98%, and 94.4%, respectively. Compared with the baseline YOLOv7, the improved CSCW-YOLOv7 obtained precision, recall, and mAP increases of 1.8%, 1%, and 2.1%, respectively. Meanwhile, the parameters were compressed by 10.7% with a 3.8-MB reduction, resulting in a 10% decrease in floating-point operations per second (FLOPs). The Gradient-weighted Class Activation Mapping (Grad-CAM) visualization method suggested that the CSCW-YOLOv7 can learn a more representative set of features that can help better locate the weeds of different scales in complex field environments. In addition, the performance of the CSCW-YOLOv7 was compared to the widely used deep learning models, and results indicated that the CSCW-YOLOv7 exhibits a better ability to distinguish the overlapped weeds and small-scale weeds. The overall results suggest that the CSCW-YOLOv7 is a promising tool for the detection of weeds and has great potential for field applications.