The limitation of the small-scale expression samples generally causes the performance degradation for facial expression recognition-based methods. Also, the correlation between different expression is always ignored when performing feature extraction process. Given above, we propose a novel approach that develops multi-class differentiation feature representation guided joint dictionary learning for FER. The proposed approach mainly includes two steps: firstly, we construct multi-class differentiation feature dictionaries corresponding to different expressions of training samples, aiming to enlarge inter-expression distance to mitigate the problem of nonlinear distribution in training samples. Secondly, we joint learn the multiple feature dictionaries by optimizing the resolutions of each feature dictionary, aiming to establish the strong relationship and enhance the representation ability among multiple feature dictionaries. To sum up, the proposed approach has more discriminative ability from the representation perspective. Comprehensive experiments carried out using three public datasets, including JAFFE, CK+, and KDEF datasets, demonstrate that the proposed approach has strong performance for small-scale samples compared to several state-of-the-art methods.