Real‐time monitoring of the coal caving process in fully mechanized mining is crucial for achieving intelligent and efficient top‐coal caving. While the coal gangue identification method, employing vision and deep learning, has advanced in the realm of intelligent monitoring, it exhibits a dependency on high‐performance hardware. This reliance poses challenges for deploying identification equipment on mobile terminals, hindering the widespread application of this method. To address the issues above, the paper presents a lightweight algorithm, utilizing You Only Look Once version 5s (YOLOv5s), utilizing YOLOv5s for the real‐time perception of the top‐coal caving state in fully mechanized caving mining. We replace the backbone network of YOLOv5s with the ShuffleNetv2 structure that is more suitable for lightweight deployment, and add the Simple Attention Mechanism attention mechanism to the network structure to enhance the model's receptive field and feature expression ability, and reduce the impact of falling debris on the detection results. A dynamic experimental platform for top‐coal caving in fully mechanized caving mining for thick coal seams is set up, and preprocessing operations such as brightness, sharpening, and denoising are performed on the image data sets collected by high‐speed industrial cameras. Research results show that compared with the traditional YOLOv5s, the improved model's P, mAP, F1 score, and other indicators have increased by 3.4%, 2.1%, and 1.1%, respectively, the model size is 70% of the original, and the detection frames per second value has increased by 48.1%. The lightweight algorithm stabilizes the accuracy of coal gangue identification dramatically in real time. It dramatically reduces the computing pressure on the mobile terminal, providing basic theory and practice for real‐time monitoring of fully mechanized coal caving mining.