With the rapid development of the power system and increasing demand for intelligence, substation operation training has received more attention. Action recognition is a monitoring and analysis system based on computer vision and artificial intelligence technology that can automatically identify and track personnel actions in video frames. The system accurately identifies abnormal behaviors such as illegal operations and provides real-time feedback to trainers or surveillance systems. The commonly adopted strategy for action recognition is to first extract human skeletons from videos and then recognize the skeleton sequences. Although graph convolutional networks (GCN)-based skeleton-based recognition methods have achieved impressive performance, they operate in spatial dimensions and cannot accurately describe the dependence between different time intervals in the temporal dimension. Additionally, existing methods typically handle the temporal and spatial dimensions separately, lacking effective communication between them. To address these issues, we propose a skeleton-based method that aggregates convolutional information of different scales in the time dimension to form a new scale dimension. We also introduce a space-time-scale attention module that enables effective communication and weight generation between the three dimensions for prediction. Our proposed method is validated on public datasets NTU60 and NTU120, with experimental results verifying its effectiveness. For substation operation training, we built a real-time recognition system based on our proposed method. We collected over 400 videos for evaluation, including 5 categories of actions, and achieved an accuracy of over 98%.