2020
DOI: 10.3390/app10114002
|View full text |Cite
|
Sign up to set email alerts
|

Improving the Accuracy of Automatic Facial Expression Recognition in Speaking Subjects with Deep Learning

Abstract: When automatic facial expression recognition is applied to video sequences of speaking subjects, the recognition accuracy has been noted to be lower than with video sequences of still subjects. This effect known as the speaking effect arises during spontaneous conversations, and along with the affective expressions the speech articulation process influences facial configurations. In this work we question whether, aside from facial features, other cues relating to the articulation process would increase emotion… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(13 citation statements)
references
References 53 publications
1
12
0
Order By: Relevance
“…Example applications of the LSTM architecture are: data classification [ 22 ], speech recognition [ 23 , 24 ], handwriting recognition [ 25 ], speech synthesis [ 26 ], text coherence tests [ 27 ], biometric authentication and anomaly detection [ 28 ], detecting deception from gaze and speech [ 29 ] and anomaly detection [ 30 ]. Similarly, example applications of the GRU structure are: facial expression recognition [ 31 ], human activity recognition [ 32 ], cyberbullying detection [ 33 ], defect detection [ 34 ], human activity surveillance [ 35 ], automated classification of cognitive workload tasks [ 36 ] and speaker identification [ 37 ].…”
Section: Introductionmentioning
confidence: 99%
“…Example applications of the LSTM architecture are: data classification [ 22 ], speech recognition [ 23 , 24 ], handwriting recognition [ 25 ], speech synthesis [ 26 ], text coherence tests [ 27 ], biometric authentication and anomaly detection [ 28 ], detecting deception from gaze and speech [ 29 ] and anomaly detection [ 30 ]. Similarly, example applications of the GRU structure are: facial expression recognition [ 31 ], human activity recognition [ 32 ], cyberbullying detection [ 33 ], defect detection [ 34 ], human activity surveillance [ 35 ], automated classification of cognitive workload tasks [ 36 ] and speaker identification [ 37 ].…”
Section: Introductionmentioning
confidence: 99%
“…In [3]the research work has been carried on to determine the contribution of audiovisual cues in emotion recognition and how temporal relation between audio and video impacts the rate and speed on emotion recognition. In [6] work has been carried out on RAVDEES dataset [4]; the frames have been increased in the step of 5 starting from 5 frames/video till 65 frames/video. Two Neural Networks have been designed namely Spatial Temporal CNN [6]…”
Section: Literaturereviewmentioning
confidence: 99%
“…In [6] work has been carried out on RAVDEES dataset [4]; the frames have been increased in the step of 5 starting from 5 frames/video till 65 frames/video. Two Neural Networks have been designed namely Spatial Temporal CNN [6]…”
Section: Literaturereviewmentioning
confidence: 99%
See 1 more Smart Citation
“…In a video with subjects involved in a spontaneous conversation, the speech articulation process conspicuously influences facial configuration and has been observed to reduce the FER accuracy as compared to the case where the subjects are not talking. Bursic et al [27] noted that while examining FEs of subjects involved in such conversations, the speaking effect needs to be regarded as a crucial factor. They developed a deep neural network-based model that analyzed cues related to facial features and speech articulation extracted from a model trained for lipreading.…”
Section: Related Workmentioning
confidence: 99%