The correlation between metasurface structures and their corresponding absorption spectra is inherently complex due to intricate physical interactions. Additionally, the reliance on Maxwell’s equations for simulating these relationships leads to extensive computational demands, significantly hindering rapid development in this area. Numerous researchers have employed artificial intelligence (AI) models to predict absorption spectra. However, these models often act as black boxes. Despite training high-performance models, it remains challenging to verify if they are fitting to rational patterns or merely guessing outcomes. To address these challenges, we introduce the Explainable Encoder–Prediction–Reconstruction (EEPR) framework, which separates the prediction process into feature extraction and spectra generation, facilitating a deeper understanding of the physical relationships between metasurface structures and spectra and unveiling the model’s operations at the feature level. Our model achieves a 66.23% reduction in average Mean Square Error (MSE), with an MSE of 2.843 × 10−4 compared to the average MSE of 8.421×10−4 for mainstream networks. Additionally, our model operates approximately 500,000 times faster than traditional simulations based on Maxwell’s equations, with a time of 3×10−3 seconds per sample, and demonstrates excellent generalization capabilities. By utilizing the EEPR framework, we achieve feature-level explainability and offer insights into the physical properties and their impact on metasurface structures, going beyond the pixel-level explanations provided by existing research. Additionally, we demonstrate the capability to adjust absorption by changing the metasurface at the feature level. These insights potentially empower designers to refine structures and enhance their trust in AI applications.