This paper designs a new facial Expression recognition network called a multi‐scale feature Fusion Convolutional neural Network (EFCN). This network is proposed to solve two problems in the facial expression recognition task. First, there are many commonalities between faces of different expression categories, and the recognition task cannot be precisely performed when the commonality is greater than the individuality. Secondly, facial detail features have a significant impact on the final results of expression recognition, while the image detail features extracted by traditional convolutional neural networks are not sufficient. In order to address the above issues, the feature enhancement network (FEN) and the detail information enhancement module (DEM) are designed. The FEN fuses deep and shallow features. Accordingly, the feature map contains richer information, making it easy to identify the samples. The DEM extracts and fuses the features passed by the backbone network with multi‐scale features to enhance the network's ability to extract features from small regions of the face. We validated the proposed method on three datasets, RAF‐DB, CK+, and JAFFE, and achieved 84.50%, 97.86%, and 91.05% accuracy, respectively, and the experimental results showed the effectiveness of the proposed method in this paper. For example, on the JAFFE dataset, the recognition accuracy of this method surpasses the MLT method by 1.87%.