Thangka refers to a form of Tibetan Buddhist painting on a fabric, scroll, or Thangka, often depicting deities, scenes, or mandalas. Deep-learning-based super-resolution techniques have been applied to improve the spatial resolution of hyperspectral images (HSIs), especially for the preservation and analysis of Thangka cultural heritage. However, existing CNN-based methods encounter difficulties in effectively preserving spatial information, due to challenges such as registration errors and spectral variability. To overcome these limitations, we present a novel cross-sensor super-resolution (SR) framework that utilizes high-resolution RGBs (HR-RGBs) to enhance the spectral features in low-resolution hyperspectral images (LR-HSIs). Our approach utilizes spatial–spectral integration (SSI) blocks and spatial–spectral restoration (SSR) blocks to effectively integrate and reconstruct spatial and spectral features. Furthermore, we introduce a frequency multi-head self-attention (F-MSA) mechanism that treats high-, medium-, and low-frequency features as tokens, enabling self-attention computations across the frequency dimension. We evaluate our method on a custom dataset of ancient Thangka paintings and demonstrate its effectiveness in enhancing the spectral resolution in high-resolution hyperspectral images (HR-HSIs), while preserving the spatial characteristics of Thangka artwork with minimal information loss.