Timely and accurate rainfall prediction is crucial to social life and economic activities. Because of the influence of numerous factors on rainfall, making precise predictions is challenging. In this study, the northern Xinjiang region of China is selected as the research area. Based on the pattern of rainfall in the local area and the needs of real life, rainfall is divided into four levels, namely ‘no rain’, ‘light rain’, ‘moderate rain’, and ‘heavy rain and above’, for rainfall levels nowcasting. To solve the problem that the existing model can only extract a single time dependence and cause the loss of some valuable information in rainfall data, a prediction model named DFFNet, which is based on dual-branch feature fusion, is proposed in this paper. The two branches of the model are composed of Transformer and CNN, which are used to extract time dependence and feature interaction in meteorological data, respectively. The features extracted from the two branches are fused for prediction. To verify the performance of DFFNet, the India public rainfall dataset and some sub-datasets in the UEA dataset are chosen for comparison. Compared with the baseline models, DFFNet achieves the best prediction performance on all the selected datasets; compared with the single-branch model, the training time consumption of DFFNet on the two rainfall datasets is reduced by 21% and 9.6%, respectively, and it has a faster convergence speed. The experimental results show that it has certain theoretical value and application value for the study of rainfall nowcasting.