When urban rail transit is faced with a large number of commuter passengers during peak periods, passengers are often waiting for the next train because the subway is running at full load, which causes delays to the overall travel time of passengers. The calculation and prediction of the congestion delay in subway stations can guide the operation department and passengers to make better planning and selection. In this paper, we use a new method based on deep learning technology to evaluate the congestion delay of subway stations. Firstly, we use automatic fare collection (AFC) system data to evaluate the congestion delays of stations. Then, we use a convolutional long short-term memory (Conv-LSTM) network to extract spatial and temporal characteristics to solve the short-term prediction problem of the subway congestion delay in the network structure. The spatiotemporal variables include inbound passenger flow, outbound passenger flow, number of passengers delayed, and average delay time. As a spatiotemporal sequence, the input and prediction targets are both spatiotemporal three-dimensional tensors in the end-to-end training model. The effectiveness of the method is verified by a case study of the Chongqing Rail Transit. Experimental results show that Conv-LSTM is better than the benchmark models in capturing spatial and temporal correlation.