The simulation and prediction of glacially derived runoff are significant for water resource management and sustainable development in water-stressed arid regions. However, the application of a hydrological model in such regions is typically limited by the intricate runoff production mechanism, which is associated with snow and ice melting, and sparse monitoring data over glacierized headwaters. To address these limitations, this study develops a set of mathematical models with a certain physical significance and an efficient particle swarm optimization algorithm by applying long- and short-term memory networks on the glacierized Muzati River basin. First, the trends in the runoff, precipitation, and air temperature are analyzed from 1990 to 2015, and differences in their correlations in this period are exposed. Then, Particle Swarm Optimization–Long Short-Term Memory (PSO-LSTM) and Bi-directional Long Short-Term Memory (BiLSTM) models are combined and applied to the precipitation and air temperature data to predict the glacially derived runoff. The prediction accuracy is validated by the observed runoff at the river outlet at the Pochengzi hydrological station. Finally, two other types of models, the RF (Random Forest) and LSTM (Long Short-Term Memory) models, are constructed to verify the prediction results. The results indicate that the glacially derived runoff is strongly correlated with air temperature and precipitation. However, in the study region over the past 26 years, the air temperature was not obviously increasing, and the precipitation and glacially derived runoff were significantly decreasing. The test results show that the PSO-LSTM and BiLSTM runoff prediction models perform better than the RF and LSTM models in the glacierized Muzati River basin. In the validation period, among all models, the PSO-LSTM model has the smallest mean absolute error and root-mean-square error and the largest coefficient of determination of 6.082, 8.034, and 0.973, respectively. It is followed by the BiLSTM model having a mean absolute error, root-mean-square error, and coefficient of determination of 6.751, 9.083, and 0.972, respectively. These results imply that both the particle swarm optimization algorithm and the bi-directional structure can effectively enhance the prediction accuracy of the baseline LSTM model. The results presented in this study can provide a deeper understanding and a more appropriate method of predicting the glacially derived runoff in glacier-fed river basins.