2021
DOI: 10.1109/access.2021.3069881
|View full text |Cite
|
Sign up to set email alerts
|

Shape and Texture Aware Facial Expression Recognition Using Spatial Pyramid Zernike Moments and Law’s Textures Feature Set

Abstract: Facial expression recognition (FER) requires better descriptors to represent the face patterns as the facial region changes due to the movement of the face muscles during an expression. In this paper, a method of concatenating spatial pyramid Zernike moments based shape features and Law's texture features is proposed to uniquely capture the macro and micro details of each facial expression. The proposed method employs multilayer perceptron and radial basis function feed forward artificial neural networks for r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
11
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(11 citation statements)
references
References 57 publications
0
11
0
Order By: Relevance
“…For performance evaluation, we have compared our two proposed approaches with the state-of-the-art methods that used the KDEF database earlier in Table 4 : (1)–(2) a deep convolutional neural network AlexNet which is pre-trained before taken, and classification is completed with the proposed two feature selection schemes to choose either the selection of facial action units by training with binary action unit detectors for every feature map and sort them [ 49 ] or detecting the feature maps in the areas inside the face areas found by the deconvolutional neural network [ 49 ] and this selection of feature maps are influencing the classification robustness, but both these schemes achieved accuracy which is 10.2–12% less than that of our first proposed approach, and 10.9–12.7% less than our second proposed approach; (3) a multi-view facial expression is recognized by multi-channel pose aware convolutional neural network [ 50 ] and has achieved accuracy which is 11.5–12.2% less than that of our proposed approaches; (4) a CNN [ 51 ] which is pre-trained with deep stacked convolutional auto encoder (DSCAE) which will generate a feature vector for expression recognition by overcoming the illumination problem and has achieved better accuracy compared to the other five state-of-the-art methods but still 2.9–3.6% less than that of our proposed approaches; (5) adding the gradient and laplacian inputs to an image given to CNN [ 52 ] helps in recognizing the facial expression but with accuracy which is 10.2–10.9% less than that of our proposed first and second approaches; (6) a usage of the Haar classifier before feeding into the deep neural network can reduce convergence time more than others without having it and it achieved the best accuracy compared to the other state-of-the-art methods but still 1.8–2.5% less than that of our first and second proposed approaches [ 53 ]; (7) a radial basis function neural network [ 54 ] which uses a feature integration of shape descriptors and texture features for expression recognition has achieved accuracy which is 9.6–10.3% less than that of our first and second proposed approaches.…”
Section: Resultsmentioning
confidence: 98%
See 1 more Smart Citation
“…For performance evaluation, we have compared our two proposed approaches with the state-of-the-art methods that used the KDEF database earlier in Table 4 : (1)–(2) a deep convolutional neural network AlexNet which is pre-trained before taken, and classification is completed with the proposed two feature selection schemes to choose either the selection of facial action units by training with binary action unit detectors for every feature map and sort them [ 49 ] or detecting the feature maps in the areas inside the face areas found by the deconvolutional neural network [ 49 ] and this selection of feature maps are influencing the classification robustness, but both these schemes achieved accuracy which is 10.2–12% less than that of our first proposed approach, and 10.9–12.7% less than our second proposed approach; (3) a multi-view facial expression is recognized by multi-channel pose aware convolutional neural network [ 50 ] and has achieved accuracy which is 11.5–12.2% less than that of our proposed approaches; (4) a CNN [ 51 ] which is pre-trained with deep stacked convolutional auto encoder (DSCAE) which will generate a feature vector for expression recognition by overcoming the illumination problem and has achieved better accuracy compared to the other five state-of-the-art methods but still 2.9–3.6% less than that of our proposed approaches; (5) adding the gradient and laplacian inputs to an image given to CNN [ 52 ] helps in recognizing the facial expression but with accuracy which is 10.2–10.9% less than that of our proposed first and second approaches; (6) a usage of the Haar classifier before feeding into the deep neural network can reduce convergence time more than others without having it and it achieved the best accuracy compared to the other state-of-the-art methods but still 1.8–2.5% less than that of our first and second proposed approaches [ 53 ]; (7) a radial basis function neural network [ 54 ] which uses a feature integration of shape descriptors and texture features for expression recognition has achieved accuracy which is 9.6–10.3% less than that of our first and second proposed approaches.…”
Section: Resultsmentioning
confidence: 98%
“…system and achieved an accuracy of 96.6% on the KDEF dataset. M. Vijayalakshmi et al [ 54 ] proposed a radial basis function neural network that integrates the shape and texture feature descriptors for expression recognition and achieved an accuracy of 94.2% on the KDEF dataset. B. Hasani et al [ 55 ] proposed a technique to extract temporal relations of consecutive video sequence frames using 3D CNN as well as 3D inception residual network layers to extract spatial relations within facial images using LSTM (long short-term memory).…”
Section: Related Workmentioning
confidence: 99%
“…Our proposed hybrid approach is compared with recent state-of-the-art FER methods, including MULTICNN [ 28 ], HDNN [ 38 ], RTCNN [ 29 ], ALEXNET+LDA [ 41 ], MSLBP+SVM [ 23 ], DL-FER [ 34 ], and RBFNN [ 26 ], which used either machine learning methods alone without using CNN or a combined model of CNN and SVM methods on the KDEF dataset. Table 5 reports our experimental results and shows the comparison with these methods.…”
Section: Resultsmentioning
confidence: 99%
“…Table 5 reports our experimental results and shows the comparison with these methods. In Table 5 , MSLBP+SVM [ 23 ], RBFNN [ 26 ] used machine learning methods for feature extraction and classification, whereas DL-FER [ 34 ], MULTICNN [ 28 ] and RTCNN [ 29 ] used deep neural networks alone. ALEXNET+LDA [ 41 ] and HDNN [ 38 ] used the framework of combining deep neural network for feature extraction and machine learning techniques for classification.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation