Short-term prediction of origin–destination (OD) flow is a primary but complex assignment to urban rail companies, which is the basis of intelligent and real-time urban rail transit (URT) operation and management. The short-term prediction of URT OD flow has three special characteristics: data lag, data dimensionality, and data malconformation, distinguishing it from other short-term prediction tasks. It is essential to propose a novel prediction algorithm that considers the special characteristics of the URT OD flow. For this purpose, based on deep learning methods and multi-source big data, a modified spatial–temporal long short-term memory (ST-LSTM) model is established. The proposed model comprises four components: (1) a temporal feature extraction module is devised to extract time information within network-wide historical OD data; (2) a spatial correlation learning module is introduced to address the data malconformation and data dimensionality problems, which provides an interpretable spatial correlation quantization method; (3) an input control-gated mechanism is originally proposed to solve the data lag problem, which combines the processed available OD flow and real-time inflow/outflow; (4) a fusion module combines historical spatial–temporal features with real-time information to achieve accurate OD flow prediction. We also further discuss the interpretability of the model in detail. The ST-LSTM model is evaluated by sufficient experiments on two large-scale actual subway datasets from Nanjing and Beijing, and the experimental results demonstrate that it can better learn the spatial–temporal correlations and exceed the rest benchmarking methods.