Thermal infrared remotely sensed data, by capturing the thermal radiation characteristics emitted by the Earth’s surface, plays a pivotal role in various domains, such as environmental monitoring, resource exploration, agricultural assessment, and disaster early warning. However, the acquisition of thermal infrared hyperspectral remotely sensed imagery necessitates more complex and higher-precision sensors, which in turn leads to higher research and operational costs. In this study, a novel Convolutional Neural Network (CNN)–Transformer combined block, termed CTBNet, is proposed to address the challenge of thermal infrared multispectral image spectral reconstruction. Specifically, the CTBNet comprises blocks that integrate CNN and Transformer technologies (CTB). Within these CTBs, an improved self-attention mechanism is introduced, which not only considers features across spatial and spectral dimensions concurrently, but also explicitly extracts incremental features from each channel. Compared to other algorithms, the proposed method more closely aligns with the true spectral curves in the reconstruction of hyperspectral images across the spectral dimension. Through a series of experiments, this approach has been proven to ensure robustness and generalizability, outperforming some state-of-the-art algorithms across various metrics.