Facial Expression Recognition (FER) is one of the most important research problems in computer vision and Artificial Intelligence (AI) due to its potential applications, many studies were proposed for the FER, whether based on using handcrafted (Craft) features with traditional machine learning techniques or using end to end convolution neural network (CNN). In this paper, we proposed a new model called CNNCraft-net based on combining the advantages of CNN and traditional models by concatenating features outputs from CNN, autoencoder, and handcrafted features such as scale-invariant feature transform (SIFT), speed up robust feature (SURF) and Oriented Fast Rotated Brief (ORB), computed by the bag of visual words (BOVW) to recognize eight facial expressions for static RGB images. For the comparative analysis, multiple metrics were used such as Accuracy, Loss, F-measure, precision, and recall. The high imbalanced AffectNet and FER2013 datasets were used to evaluate the proposed model where the proposed model achieves accuracy 61.9% for eight expressions and 65% for seven expressions for AffectNet and 69% for FER2013.