Accurately understanding the timing characteristics of urban rail transit (URT) passenger flows is helpful in improving the operational efficiency. A deep learning-based approach is used in prediction in URT systems, leveraging the timing characteristics of stations to classify them. Firstly, the dynamic time warping (DTW) is employed to quantify the dissimilarities, while the K-means algorithm is utilized to categorize stations based on the timing attributes of passenger flows. Secondly, in order to mitigate the impact of data noise, the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) approach decomposes passenger flow data. Finally, an improved time series transformer model is proposed for deep learning prediction of passenger flows at different types of stations. The effectiveness of the approach is confirmed through the analysis of Xi'an Metro's passenger flow data, revealing the presence of four station classifications: occupation-residential balance type, business office type, leisure and entertainment type, and dense residential type. Compared with the Auto Regressive Integrated Moving Average (ARIMA) model, the Support Vector Regression (SVR) model, the Long Short-Term Memory (LSTM) model and Transformer, Mean Absolute Error (MAE) of prediction results in different types of stations is reduced by 8.94%~34.32%, Root Mean Square Error (RMSE) is reduced by 8.30%~36.50%, and Mean Absolute Percentage Error (MAPE) is reduced by 13.77%~38.84%. The accuracy of prediction for different stations using the proposed method is superior to the benchmark models.