Since hyperspectral satellite images (HSIs) usually hold low spatial resolution, improving the spatial resolution of hyperspectral imaging (HSI) is an effective solution to explore its potential for remote sensing applications, such as land cover mapping over urban and coastal areas. The fusion of HSIs with high spatial resolution multispectral images (MSIs) and panchromatic (PAN) images could be a solution. To address the challenging work of fusing HSIs, MSIs and PAN images, a novel easy-to-implement stepwise fusion approach was proposed in this study. The fusion of HSIs and MSIs was decomposed into a set of simple image fusion tasks through spectral grouping strategy. HSI, MSI and PAN images were fused step by step using existing image fusion algorithms. According to different fusion order, two strategies ((HSI+MSI)+PAN and HSI+(MSI+PAN)) were proposed. Using simulated and real Gaofen-5 (GF-5) HSI, MSI and PAN images from the Gaofen-1 (GF-1) PMS sensor as experimental data, we compared the proposed stepwise fusion strategies with the traditional fusion strategy (HSI+PAN), and compared the performances of six fusion algorithms under three fusion strategies. We comprehensively evaluated the fused results through three aspects: spectral fidelity, spatial fidelity and computation efficiency evaluation. The results showed that (1) the spectral fidelity of the fused images obtained by stepwise fusion strategies was better than that of the traditional strategy; (2) the proposed stepwise strategies performed better or comparable spatial fidelity than traditional strategy; (3) the stepwise strategy did not significantly increase the time complexity compared to the traditional strategy; and (4) we also provide suggestions for selecting image fusion algorithms using the proposed strategy. The study provided us with a reference for the selection of fusion strategies and algorithms in different application scenarios, and also provided an easy-to-implement solution and useful references for fusing HSI, MSI and PAN images.