A survey on Speech Emotion Recognition

Reshma, C V; Rajasree, R.

doi:10.1109/icci46240.2019.9404432

Cited by 6 publications

(1 citation statement)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Rhythmic features are those features that can be perceived by humans, such as intonation and rhythm, which are the most significant features for expressing emotional content in speech emotion recognition [10]- [14]. Sound quality features are used to measure whether the speech is pure, clear, easily recognizable, etc.…”

Section: Introductionmentioning

confidence: 99%

Speech emotion recognition based on dynamic convolutional neural network

Lin¹,

Hu²,

Zhu³

2023

JCEIM

View full text Add to dashboard Cite

In speech emotion recognition, the use of deep learning algorithms that extract and classify features of audio emotion samples usually requires the use of a large amount of resources, which makes the system more complex. This paper proposes a speech emotion recognition system based on dynamic convolutional neural network combined with bi-directional long and short-term memory network. On the one hand, the dynamic convolutional kernel allows the neural network to extract global dynamic emotion information, which can improve the performance while ensuring the computational power of the model, and on the other hand, the bi-directional long and short-term memory network enables the model to classify the emotion features more effectively with the temporal information. In this paper, we use CISIA Chinese speech emotion dataset, EMO-DB German emotion corpus and IEMOCAP English corpus to conduct experiments, and the average emotion recognition accuracy of the experimental results are 59.08%, 89.29% and 71.25%, which are 1.17%, 1.36% and 2.97% higher than the accuracy of speech emotion recognition systems using mainstream models, respectively. The effectiveness of the method in this paper is proved.

show abstract

Section: Introductionmentioning

confidence: 99%