Accurate prediction of citywide short-term metro passenger flow is essential to urban management and transport scheduling. Recently, an increasing number of researchers have applied deep learning models to passenger flow prediction. Nevertheless, the task is still challenging due to the complex spatial dependency on the metro network and the time-varying traffic patterns. Therefore, we propose a novel deep learning architecture combining graph attention networks (GAT) with long short-term memory (LSTM) networks, which is called the hybrid GLM (hybrid GAT and LSTM Model). The proposed model captures the spatial dependency via the graph attention layers and learns the temporal dependency via the LSTM layers. Moreover, some external factors are embedded. We tested the hybrid GLM by predicting the metro passenger flow in Shanghai, China. The results are compared with the forecasts from some typical data-driven models. The hybrid GLM gets the smallest root-mean-square error (RMSE) and mean absolute percentage error (MAPE) in different time intervals (TIs), which exhibits the superiority of the proposed model. In particular, in the TI 10 min, the hybrid GLM brings about 6–30% extra improvements in terms of RMSE. We additionally explore the sensitivity of the model to its parameters, which will aid the application of this model.