Accurate traffic status prediction is of great importance to improve the security and reliability of the intelligent transportation system. However, urban traffic status prediction is a very challenging task due to the tight symmetry among the Human–Vehicle–Environment (HVE). The recently proposed spatial–temporal 3D convolutional neural network (ST-3DNet) effectively extracts both spatial and temporal characteristics in HVE, but ignores the essential long-term temporal characteristics and the symmetry of historical data. Therefore, a novel spatial–temporal 3D residual correlation network (ST-3DRCN) is proposed for urban traffic status prediction in this paper. The ST-3DRCN firstly introduces the Pearson correlation coefficient method to extract a high correlation between traffic data. Then, a dynamic spatial feature extraction component is constructed by using 3D convolution combined with residual units to capture dynamic spatial features. After that, based on the idea of long short-term memory (LSTM), a novel architectural unit is proposed to extract dynamic temporal features. Finally, the spatial and temporal features are fused to obtain the final prediction results. Experiments have been performed using two datasets from Chengdu, China (TaxiCD) and California, USA (PEMS-BAY). Taking the root mean square error (RMSE) as the evaluation index, the prediction accuracy of ST-3DRCN on TaxiCD dataset is 21.4%, 21.3%, 11.7%, 10.8%, 4.7%, 3.6% and 2.3% higher than LSTM, convolutional neural network (CNN), 3D-CNN, spatial–temporal residual network (ST-ResNet), spatial–temporal graph convolutional network (ST-GCN), dynamic global-local spatial–temporal network (DGLSTNet), and ST-3DNet, respectively.