Accurate high spatial resolution snow depth mapping in arid and semi-arid regions is of great importance for snow disaster assessment and hydrological modeling. However, due to the complex topography and low spatial-resolution microwave remote-sensing data, the existing snow depth datasets have large errors and uncertainty, and actual spatiotemporal heterogeneity of snow depth cannot be effectively detected. This paper proposed a deep learning approach based on downscaling snow depth retrieval by fusion of satellite remote-sensing data with multiple spatial scales and diverse characteristics. The (Fengyun-3 Microwave Radiation Imager) FY-3 MWRI data were downscaled to 500 m resolution to match Moderate-resolution Imaging Spectroradiometer (MODIS) snow cover, meteorological and geographic data. A deep neural network was constructed to capture detailed spectral and radiation signals and trained to retrieve the higher spatial resolution snow depth from the aforementioned input data and ground observation. Verified by in situ measurements, downscaled snow depth has the lowest root mean square error (RMSE) and mean absolute error (MAE) (8.16 cm, 4.73 cm respectively) among Environmental and Ecological Science Data Center for West China Snow Depth (WESTDC_SD, 9.38 cm and 5.36 cm), the Microwave Radiation Imager (MWRI) Ascend Snow Depth (MWRI_A_SD, 9.45 cm and 5.49 cm) and MWRI Descend Snow Depth (MWRI_D_SD, 10.55 cm and 6.13 cm) in the study area. Meanwhile, downscaled snow depth could provide more detailed information in spatial distribution, which has been used to analyze the decrease of retrieval accuracy by various topography factors.