Understanding the distribution of rock glaciers provides key information for investigating and recognizing the status and changes of the cryosphere environment. Deep learning algorithms and red–green–blue (RGB) bands from high-resolution satellite images have been extensively employed to map rock glaciers. However, the near-infrared (NIR) band offers rich spectral information and sharp edge features that could significantly contribute to semantic segmentation tasks, but it is rarely utilized in constructing rock glacier identification models due to the limitation of three input bands for classical semantic segmentation networks, like DeeplabV3+. In this study, a dual-encoder DeeplabV3+ network (DEDNet) was designed to overcome the flaws of the classical DeeplabV3+ network (CDNet) when identifying rock glaciers using multispectral remote sensing images by extracting spatial and spectral features from RGB and NIR bands, respectively. This network, trained with manually labeled rock glacier samples from the Qilian Mountains, established a model with accuracy, precision, recall, specificity, and mIoU (mean intersection over union) of 0.9131, 0.9130, 0.9270, 0.9195, and 0.8601, respectively. The well-trained model was applied to identify new rock glaciers in a test region, achieving a producer’s accuracy of 93.68% and a user’s accuracy of 94.18%. Furthermore, the model was employed in two study areas in northern Tien Shan (Kazakhstan) and Daxue Shan (Hengduan Shan, China) with high accuracy, which proved that the DEDNet offers an innovative solution to more accurately map rock glaciers on a larger scale due to its robustness across diverse geographic regions.