Music emotion classification is becoming an important research direction due to its great significance for the music information retrieval (MIR). For the music emotion classification task, how to fully extract related features from the original music audio is the key to improve classification accuracy. In this paper, we propose a music feature fusion attention (MFFA) model to improve the efficiency of mining music emotional features. The proposed model combines a feature fusion attention (FFA) module and a Bi-directional Gated Recurrent Units (BiGRU) module to extract music emotional features from both spatial and temporal dimensions. Firstly, we use the FFA module as a feature extractor, feeding the log Mel-spectrogram of music audio into it, to obtain more comprehensive and effective feature information through multi-scale feature fusion and multi-layer attention mechanisms. At the same time, global residual connection and local residual connection in the FFA are used to learn features in all aspects. The BiGRU module is then used to further capture the temporal relationships of music sequences, and feature concatenation is used to fuse spatial and temporal features. The experimental results show that the proposed model has 1.2%~7.9% improvement over five other baselines. Also, the ablation experiments demonstrate the effectiveness of the combination of FFA module and BiGRU module of the proposed model.