Spectral computed tomography (CT) reconstructs the same scanned object from projections of multiple narrow energy windows, and it can be used for material identification and decomposition. However, the multi-energy projection dataset has a lower signal-noise-ratio (SNR), resulting in poor reconstructed image quality. To address this thorny problem, we develop a spectral CT reconstruction method, namely spatial-spectral cube matching frame (SSCMF). This method is inspired by the following three facts: i) human body usually consists of two or three basic materials implying that the reconstructed spectral images have a strong sparsity; ii) the same basic material component in a single channel image has similar intensity and structures in local regions. Different material components within the same energy channel share similar structural information; iii) multi-energy projection datasets are collected from the subject by using different narrow energy windows, which means images reconstructed from different energy-channels share similar structures. To explore those information, we first establish a tensor cube matching frame (CMF) for a BM4D denoising procedure. Then, as a new regularizer, the CMF is introduced into a basic spectral CT reconstruction model, generating the SSCMF method. Because the SSCMF model contains an L 0-norm minimization of 4D transform coefficients, an effective strategy is employed for optimization. Both numerical simulations and realistic preclinical mouse studies are performed. The results show that the SSCMF method outperforms the state-of-the-art algorithms, including the simultaneous algebraic reconstruction technique, total variation minimization, total variation plus low rank, and tensor dictionary learning.