The fusion of multi-spectral and synthetic aperture radar (SAR) images could retain the advantages of each data, hence benefiting accurate land cover classification. However, some current image fusion methods face the challenge of producing unexpected noise. To overcome the aforementioned problem, this paper proposes a novel fusion method based on weighted median filter and Gram–Schmidt transform. In the proposed method, Sentinel-2A images and GF-3 images are respectively subjected to different preprocessing processes. Since weighted median filter does not strongly blur edges while filtering, it is applied to Sentinel-2A images for reducing noise. The processed Sentinel images are then transformed by Gram–Schmidt with GF-3 images. Two popular methods, principal component analysis method and traditional Gram–Schmidt transform, are used as the comparison methods in the experiment. In addition, random forest, a powerful ensemble model, is adopted as the land cover classifier due to its fast training speed and excellent classification performance. The overall accuracy, Kappa coefficient and classification map of the random forest are used as the evaluation criteria of the fusion method. Experiments conducted on five datasets demonstrate the superiority of the proposed method in both objective metrics and visual impressions. The experimental results indicate that the proposed method can improve the overall accuracy by up to 5% compared to using the original Sentinel-2A and has the potential to improve the satellite-based land cover classification accuracy.