Facial expression is an important way for people to express their emotions. Recently, with the advancement of the computer vision, facial expression recognition has become a current research hotspot and has made significant progress, which can be utilized in the interaction between humans and computers, emotional computing and other computer vision fields, the evolution of artificial intelligence and deep learning has better promoted the research of facial expression recognition. Conventional approaches are largely based on machine learning, which leverages artificial way for feature extraction. The extracted facial expression features are interfered by human factors, so that the trained classifier cannot effectively interpret the expression information, which ultimately leads to insufficient model generalization ability and low recognition accuracy. However, deep learning-based facial expression recognition in real scenes is still affected by factors such as human pose, different degrees of facial occlusion, background environment and light interference, and the recognition accuracy still needs to be further improved. This paper focuses on representative convolutional neural network architectures and summarizes the strengths, weaknesses, and innovations of these networks, through which the advancement of neural network architectures shows the great potential of the direction.