Aiming at the difficulty of recognising the smoking and making phone calls behaviours of people in the complex background of construction sites, a method of recognising human elbow flexion behaviour based on posture estimation is proposed. The human upper body key points needed are retrained based on AlphaPose to achieve human object localization and key points detection. Then, a mathematical model for human elbow flexion behaviour discrimination (HEFBD model) is proposed based on human key points, as well as locating the region of interest for small object detection and reducing the interference of complex background. A super‐resolution image reconstruction method is used for pre‐processing some blurred images. In addition, YOLOv5s is improved by adding a small object detection layer and integrating a convolutional block attention model to improve the detection performance. The detection precision of this method is improved by 5.6%, and the false detection rate caused by complex background is reduced by 13%, which outperforms other state‐of‐the‐art detection methods and meets the requirement of real‐time performance.