Two-dimensional cepstrum analysis approach in emotion recognition from speech

Guoth, Igor; Chmulík, Michal; Polacky, Jozef; Kuba, Michal

doi:10.1109/tsp.2016.7760892

Cited by 4 publications

(3 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We show that the proposed SER system offers higher accuracy, sensitivity, and specificity than other NN-based SER systems, as well as the lowest average processing time, surpassing other state-of-the-art methods with up to 36 s. The system also proves to be robust, offering over 73% accuracy and similar processing time across different databases, and is only surpassed by the method based on bat algorithm and PSO [5] on naturalistic databases, showing that the proposed SER system performs very well at predicting emotions that are stimulated, while for emotions collected in naturalistic conditions, other methods offer better results and can be fused with the proposed FFNN-based method to reach higher accuracy.…”

Section: Discussionmentioning

confidence: 83%

“…Moreover, the proposed system offers higher accuracy, sensitivity, and specificity compared to other non-NN-based SER systems on McGilloway [34], structured Belfast [35], SALAS [36], and the database proposed in this study. When tested on the AVIC database [32], the state-of-the-art method offering the most accurate predictions remains the one based on Bat algorithm and PSO [5]. This shows that our system performs well at predicting emotions that are stimulated, while for emotions collected in naturalistic conditions other methods offer higher accuracy and could be fused with the proposed FFNN-based method to reach better accuracy.…”

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 81%

“…In the literature, a broad spectrum of methods has been proposed for building efficient and accurate speech emotion recognition (SER) systems. Guoth et al [5] propose a two-dimensional homomorphic analysis based on bat algorithm and particle swarm optimization (PSO) for recognizing the six basic emotions (fear, anger, surprise, happiness, sadness, and disgust) in continuous space. The proposed architecture is tested on the interactive emotional dyadic motion capture (IEMOCAP) database [6] and reaches an accuracy of 75%.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Feedforward Neural Network-Based Architecture for Predicting Emotions from Speech

Gavrilescu

Vizireanu

2019

Data

View full text Add to dashboard Cite

We propose a novel feedforward neural network (FFNN)-based speech emotion recognition system built on three layers: A base layer where a set of speech features are evaluated and classified; a middle layer where a speech matrix is built based on the classification scores computed in the base layer; a top layer where an FFNN- and a rule-based classifier are used to analyze the speech matrix and output the predicted emotion. The system offers 80.75% accuracy for predicting the six basic emotions and surpasses other state-of-the-art methods when tested on emotion-stimulated utterances. The method is robust and the fastest in the literature, computing a stable prediction in less than 78 s and proving attractive for replacing questionnaire-based methods and for real-time use. A set of correlations between several speech features (intensity contour, speech rate, pause rate, and short-time energy) and the evaluated emotions is determined, which enhances previous similar studies that have not analyzed these speech features. Using these correlations to improve the system leads to a 6% increase in accuracy. The proposed system can be used to improve human–computer interfaces, in computer-mediated education systems, for accident prevention, and for predicting mental disorders and physical diseases.

show abstract

Section: Discussionmentioning

confidence: 83%