In the field of remote sensing, image fusion technology plays a crucial role in observing the state of global resources and environmental conditions, proposing response strategies, and constantly monitoring and correcting strategies. Currently, the majority of traditional methods exhibit varying degrees of spatial or spectral distortion, and these unreasonable spectral distributions may contain erroneous geographical feature information. Meanwhile, despite their performance in fusion results, deep learning-based methods cannot be applied to some practical application scenarios due to the requirement for hardware specifications resulting from the large number of parameters in their models. These issues are not conducive to accurately reflecting the actual geomorphic resource conditions or promoting sustainable development. In order to address the above issues, we propose a novel recursive self-attention module (RSAM), which consists of two stages: spatial-spectral similarity extraction and self-attention weight generation. The proposed RSAM employs a G2L strategy to capture the global interdependencies of two distinct local locations in the feature map. This method allows for simultaneous consideration of both spatial and spectral information, while focusing on more mutual information between spectral and spatial dimensions. Subsequently, we construct the corresponding residual block (RSARB) through RSAM and concatenate the RSARBs to generate RSANet with a limited number of parameters. Extensive experiments demonstrate that RSANet achieves superior results in both qualitative and quantitative evaluation, despite the model parameters being within a narrow range of orders of magnitude. This demonstrates that the proposed method possesses robust feature learning capability and practicality for observing and studying the global resource environment. The source code will be publicly available at https://github.com/JUSTM0VE0N/RSANet.