To address the high cost associated with acquiring hyperspectral data, spectral reconstruction (SR) has emerged as a prominent research area. However, contemporary SR techniques are more focused on image processing tasks in computer vision than on practical applications. Furthermore, the prevalent approach of employing single-dimensional features to guide reconstruction, aimed at reducing computational overhead, invariably compromises reconstruction accuracy, particularly in complex environments with intricate ground features and severe spectral mixing. Effectively utilizing both local and global information in spatial and spectral dimensions for spectral reconstruction remains a significant challenge. To tackle these challenges, this study proposes an integrated network of 3D CNN and U-shaped Transformer for heterogeneous spectral reconstruction, ICTH, which comprises a shallow feature extraction module (CSSM) and a deep feature extraction module (TDEM), implementing a coarse-to-fine spectral reconstruction scheme. To minimize information loss, we designed a novel spatial–spectral attention module (S2AM) as the foundation for constructing a U-transformer, enhancing the capture of long-range information across all dimensions. On three hyperspectral datasets, ICTH has exhibited remarkable strengths across quantitative, qualitative, and single-band detail assessments, while also revealing significant potential for subsequent applications, such as generalizability and vegetation index calculations) in two real-world datasets.