The Tibetan Plateau (TP) is an important component of the global environmental system, on which the snow cover greatly affects the regional climate and ecology. Moderate resolution imaging spectroradiometer (MODIS) snow cover products have been demonstrated to be appropriate for investigating the snow cover over the TP. However, they are subject to cloud obscuration, and the TP’s extremely complex terrain makes the snow monitoring difficult. Therefore, in this paper, we propose a two-stage spatio–temporal fusion framework for the cloud removal of MODIS C6 snow products, including an adjusted Terra and Aqua combination (TAC) and a spatio–temporal fusion based on Gaussian kernel function and error correction (STF-GKF-EC). To the best of our knowledge, this is the first time that a spatio–temporally continuous daily 500-m MODIS normalized difference snow index (NDSI) product has been generated for the TP, which greatly improves the spatial and temporal resolutions of the current snow cover products. The main stage, STF-GKF-EC, adaptively weights the spatial and temporal correlations by the Gaussian kernel function, and further takes the rapid changes of snow cover into consideration through the error correction. The experiments indicated that STF-GKF-EC removes clouds completely, achieving an overall accuracy (OA) and mean absolute error (MAE) of 91.48% and 3.88, respectively. Based on the cloud-removed results, during 2001–2017, as far as the intra-annual variation is concerned, a large proportion of the snow cover appears between October and May, with a peak in February/March, and the variation is mainly controlled by temperature. For the inter-annual variation, an obvious increasing trend of 0.68/year for NDSI is observed before 2005, followed by a slight decreasing trend of 0.16/year, in which precipitation is a better explanation factor than temperature.