Facial Expression Recognition (FER) is a basic and crucial computer vision task of classifying emotional expressions from human faces images into various emotion categories such as happy, sad, surprised, scared, angry, etc. Recently, facial expression recognition based on deep learning has made great progress. However, no matter the weight initialization technology or the attention mechanism, the face recognition method based on deep learning hard to capture those visually insignificant but semantically important features. To aid above question, in this paper we present a novel Facial Expression Recognition training strategy consisting of two components: Memo Affinity Loss (MAL) and Mask Attention Fine Tuning (MAFT). MAL is a variant of center loss, which uses memory bank strategy as well as discriminative center. MAL widens the distance between different clusters and narrows the distance within each cluster. Therefore, the features extracted by CNN were comprehensive and independent, which produced a more robust model. MAFT is a strategy that blindfolds attention parts temporarily and forces the model to learn from other important regions of the input image. It's not only an augmenting technique, but also a novel fine-tuning approach. As we know, we are the first to apply the mask strategy to the attention part and use this strategy to fine-tune the models. Finally, to implement our ideas, we constructed a new network named Architecture Attention ResNet based on ResNet-18. Our methods are conceptually and practically simple, but receives superior results on popular public facial expression recognition benchmarks with 88.75% on RAF-DB, 65.17% on AffectNet-7, 60.72% on AffectNet-8. The code will open source soon.