Brain surgery is a widely practised and effective treatment for brain tumours, but accurately identifying and classifying tumour boundaries is crucial to maximise resection and avoid neurological complications. This precision in classification is essential for guiding surgical decisions and subsequent treatment planning. Hyperspectral (HS) imaging (HSI) is an emerging multidimensional optical imaging method that captures detailed spectral information across multiple wavelengths, allowing for the identification of nuanced differences in tissue composition, with the potential to enhance intraoperative tissue classification. However, current frameworks often require retraining models for each HSI to extract meaningful features, resulting in long processing times and high computational costs. Additionally, most methods utilise the deep semantic features at the end of the network for classification, ignoring the spatial details contained in the shallow features. To overcome these challenges, we propose a novel approach called MedDiffHSI, which combines diffusion and transformer techniques. Our method involves training an unsupervised learning framework based on the diffusion model to extract high‐level and low‐level spectral–spatial features from HSI. This approach eliminates the need for retraining of spectral–spatial feature learning model, thereby reducing time complexity. We then extract intermediate multistage features from different timestamps for classification using a pretrained denoising U‐Net. To fully explore and exploit the rich contextual semantics and textual information hidden in the extracted diffusion feature, we utilise a spectral–spatial attention module. This module not only learns multistage information about features at different depths, but also extracts and enhances effective information from them. Finally, we employ a supervised transformer‐based classifier with weighted majority voting (WMV) to perform the HSI classification. To validate our approach, we conduct comprehensive experiments on in vivo brain database data sets and also extend the analysis to include additional HSI data sets for breast cancer to evaluate the framework performance across different types of tissue. The results demonstrate that our framework outperforms existing approaches by using minimal training samples (5%) while achieving state‐of‐the‐art performance.