Accurate atmospheric visibility prediction is of great significance to public transport safety. However, since it is affected by multiple factors, there still remains difficulties in predicting its heterogenous spatial distribution and rapid temporal variation. In this paper, a recursive neural network (RNN) prediction model modified with the frame-hopping transmission gate (FHTG), feature fusion module (FFM) and reverse scheduled sampling (RSS), named SwiftRNN, is developed. The new FHTG is used to accelerate training, the FFM is used for extraction and fusion of global and local features, and the RSS is employed to learn spatial details and improve prediction accuracy. Based on the ground-based monitoring data of atmospheric visibility from the China Meteorological Information Center during 1st January 2018 to 31st December 2020, the SwiftRNN model and two traditional ConvLSTM and PredRNN models are performed to predict hourly atmospheric visibility in central and eastern China at a lead of 12 h. The results show that the SwiftRNN model has better performance in the skill scores of visibility prediction than those of the ConvLSTM and PredRNN model. The averaged structural similarity (SSIM) of predictions at a lead up to 12 h is 0.444, 0.425 and 0.399 for the SwiftRNN, PredRNN and ConvLSTM model, respectively, and the averaged image perception similarity (LPIPS) is 0.289, 0.315 and 0.328, respectively. The averaged critical success index (CSI) of predictions over 1000 m fog area is 0.221, 0.205 and 0.194, respectively. Moreover, the training speed of the SwiftRNN model is 14.3% faster than the PredRNN model. It is also found that the prediction effect of the SwiftRNN model over 1000 m medium grade fog area is significantly improved along with lead times compared with the ConvLSTM and PredRNN model. All above results demonstrate the SwiftRNN model is a powerful tool in predicting atmospheric visibility.