The performance of facial expression recognition (FER) tends to deteriorate due to high intraclass variations and high interclass similarities. To address this problem, an expression recognition model based on a joint partial image and deep metric learning method (PI&DML) is proposed. First, we propose cropping the active units (AU) that are most closely related to the expression to generate a partial image for feature extraction, which is conducive to mitigating the negative impact of the abovementioned problems to some extent. Second, a novel expression metric loss function (EMLF) is suggested to enhance the intraclass similarities and interclass variations. Finally, superior performance is achieved by jointly optimizing the expression metric loss and classification loss. As demonstrated by the visualization results, the proposed EMLF is effective at increasing the distance between various expressions and reducing the distance between the same expressions. The evaluations on three public expression databases have demonstrated that our method is capable of achieving better results than the state-of-the-art methods. INDEX TERMS Facial expression recognition, deep metric learning, metric loss function, partial images, jointly optimizing, high intraclass variations, high interclass similarities.