Coal production often involves a substantial presence of gangue and foreign matter, which not only impacts the thermal properties of coal and but also leads to damage to transportation equipment. Selection robots for gangue removal have garnered attention in research. However, existing methods suffer from limitations, including slow selection speed and low recognition accuracy. To address these issues, this study proposes an improved method for detecting gangue and foreign matter in coal, utilizing a gangue selection robot with an enhanced YOLOv7 network model. The proposed approach entails the collection of coal, gangue, and foreign matter images using an industrial camera, which are then utilized to create an image dataset. The method involves reducing the number of convolution layers of the backbone, adding a small size detection layer to the head to enhance the small target detection, introducing a contextual transformer networks (COTN) module, employing a distance intersection over union (DIoU) loss border regression loss function to calculate the overlap between predicted and real frames, and incorporating a dual path attention mechanism. These enhancements culminate in the development of a novel YOLOv71 + COTN network model. Subsequently, the YOLOv71 + COTN network model was trained and evaluated using the prepared dataset. Experimental results demonstrated the superior performance of the proposed method compared to the original YOLOv7 network model. Specifically, the method exhibits a 3.97% increase in precision, a 4.4% increase in recall, and a 4.5% increase in mAP0.5. Additionally, the method reduced GPU memory consumption during runtime, enabling fast and accurate detection of gangue and foreign matter.