This paper presents a Spatial-Temporal Graph Convolutional Network-based Pedestrians’ behaviors Anomaly Detection system (STGCN-PAD) for grade crossings. The behaviors of pedestrians are represented in a structured manner by skeleton trajectories that are generated using a pose estimation model. The ST-GCN components are sequentially applied to capture the spatial dependencies between skeleton key points within a single video frame and the temporal relationships for each of them. Based on these features, the system reconstructs input trajectories with a constant sliding window size, and the reconstruction error is used to distinguish abnormal behaviors from those normal. To accelerate the processing of extracted multi-dimensional feature maps, an MLP-Mixer model-based reconstruction network is developed as an alternative to the traditional convolution neural network. Only trajectories of normal walking behavior are included for model training. Anomalies, such as lingering and squatting activities, can be identified as outliers by observing the magnitude of reconstruction errors. The case studies demonstrate the salient feasibility and efficiency of the proposed system, which achieves at least comparable performance (approximately 88% in the AUC evaluation metric) with several state-of-the-art approaches while using the MLP-Mixer model accelerates model inference by 10× relative to our previous effort (Song et al. in Appl Intell 53:21676–21691, 2023).