Accurately and timely predicting pedestrian crossing intentions in real-time is critical for operating intelligent vehicles on roads. Although existing models achieve promising accuracy using complex models and video image data, they are constrained for real-time practical use given the high model complexity, time-consuming data preprocessing, and low-quality image data in the wild. To address these, the paper proposes a Spatial-Temporal Attention Graph Convolution Network model for fast pedestrian crossing intention prediction (PedAST-GCN). It uses a lightweight GCN model as the backbone network with simple but robust graph representations of pedestrian crossing intention modality features, including pedestrian pose, bounding box, and vehicle speeds. The model is validated by comparing it with state-of-the-art models on two large-scale public datasets (JAAD and PIE). The results highlight the better performance of the PedAST-GCN model for pedestrian crossing intention prediction in terms of accuracy and computation times. The ablation analysis confirms the value of the backbone layer and graph design, the designed modality features, the effectiveness of attention mechanisms in capturing long-term dependencies (spatial-temporal attention) and fusing heterogeneous features (modality attention), and the robust performance across various observation lengths and in the presence of noisy data.