Snow depth is a general input variable in many models of agriculture, hydrology, climate, and ecology. However, there are some uncertainties in the retrieval of snow depth by remote sensing. Errors occurred in snow depth evaluation under the D-InSAR methods will affect the accuracy of snow depth inversion to a certain extent. This study proposes a scheme to estimate spatial snow depth that combines remote sensing with site observation. On the one hand, this scheme adopts the Sentinel-1 C-band of the European Space Agency (ESA), making use of the two-pass method of differential interferometry for inversion of spatial snow depth. On the other hand, the 3DVAR (three dimensional variational) fusion algorithm is used to integrate actual snow depth data of virtual stations and real-world observation stations into the snow depth inversion results. Thus, the accuracy of snow inversion will be improved. This scheme is applied in the study area of Bayanbulak Basin, which is located in the central hinterland of Tianshan Mountains in Xinjiang, China. Observation data from stations in different altitudes are selected to test the fusion method. According to the results, most of the obtained snow depth values using interferometry are lower than the observed ones. However, after the fusion using the 3DVAR algorithm, the snow depth accuracy is slightly higher than it was in the inversion results (R 2 = 0.31 vs. R 2 = 0.50, RMSE = 2.51 cm vs. RMSE = 1.96 cm; R 2 = 0.27 vs. R 2 = 0.46, RMSE = 4.04 cm vs. RMSE = 3.65 cm). When compared with the inversion results, the relative error (RE) improved by 6.97% and 3.59%, respectively. This study shows that the scheme can effectively improve the accuracy of regional snow depth estimation. Therefore, its future application is of great potential.