BackgroundThe emotional state of individuals is difficult to identify and it is developing now a days because of vast interest in recognition. Many technologies have been developed to identify this emotional expression based on facial expressions, vocal expressions, physiological signals, and body expressions. Among these, facial emotion is very expressive for recognition using multimodalities. Understanding facial emotions has applications in mental well‐being, decision‐making, and even social change, as emotions play a crucial role in our lives. This recognition is complicated by the high dimensionality of data and non‐linear interactions across modalities. Moreover, the way emotion is expressed by people varies and these feature identification remains challenging, where these limitations are overcome by Deep learning models.MethodsThis research work aims at facial emotion recognition through the utilization of a deep learning model, named the proposed Residual Fused‐Graph Convolution Network (RF‐GCN). Here, multimodal data included is video as well as an Electroencephalogram (EEG) signal. Also, the Non‐Local Means (NLM) filter is used for pre‐processing input video frames. Here, the feature selection process is carried out using chi‐square, after feature extraction, which is done in both pre‐processed video frames and input EEG signals. Finally, facial emotion recognition and its types are determined by RF‐GCN, which is a combination of both the Deep Residual Network (DRN) and Graph Convolutional Network (GCN).ResultsFurther, RF‐GCN is evaluated for performance by metrics such as accuracy, recall, and precision, with superior values of 91.6%, 96.5%, and 94.7%.ConclusionsRF‐GCN captures the nuanced relationships between different emotional states and improves recognition accuracy. The model is trained and evaluated on the dataset and reflects real‐world conditions.