This work concentrates on the recognition of facial emotion from video sequences with deep learning. Once the input video is converted into frames, the face detection is performed on each frame using the viola-jones face detection algorithm.Then, the feature extraction is performed by three well-performing feature extraction techniques like modified local directional pattern, spatio-temporal features, and scale-invariant feature transforms. The extracted features from all the frames of the video are concatenated. To reduce the feature-length for decreasing the training complexity, and enhance the recognition performance, the optimal feature selection is accomplished with the distance-based tunicate swarm algorithm. These selected features are processed to an innovative deep learning model termed a heuristically modified recurrent neural network. The same D-TSA improves the performance of RNN by optimally tuning its hidden neurons. Experimental results on a widely used benchmark dataset and manually collected dataset show that the classification performance is improved using spatio-temporal features, SIFT, M-LDP, and optimal feature selection, and thus, the proposed model with HM-RNN outperforms the other existing models.
K E Y W O R D Sdistance-based tunicate swarm algorithm, face expression recognition in the video, heuristically modified recurrent neural network, Viola-Jones face detection algorithm
INTRODUCTIONEmotions in human beings are understood by their facial expressions that are analyzed to identify the nature of that individual and their intentions. 1Hence, understanding these facial expressions is a persistent issue in the area of human-computer interaction (HCI). 2 Thus, these facial analyzes are broadly researched based on computer vision, in which certain applications like facial expression recognition and face recognition seek more attention in the improvement of network architectures. 3 Face expression recognition provides automated analysis in the practical applications for recognizing the facial expressions in videos, which influences the interaction interface between the human robots, which is the study of making communications among the humans and robots. Recently, the facial recognition approaches analyze the video frames using six basic expressions and also evaluated the neutral expression in the frames. 4 Similarly, face recognition methods are considered to be a significant biometric technique in many fields for providing authentication. 5 These methods use dynamic features like variations in facial texture and facial landmarks that provide eminent information for showing the emotional status. Thus, extracting the dynamic information from the whole video sequence is essential for recognizing the facial expression. These facial expression analysis systems are employed for categorizing the image or video sequence among the six basic expressions into single basic emotion. 6