Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction

Arruti, Andoni; Cearreta, Idoia; Álvarez, Aitor; Lazkano, Elena; Sierra, Basilio

doi:10.1371/journal.pone.0108975

Cited by 17 publications

(12 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, a vector of too many features may give rise to high dimension and redundancy, making the learning process complicated and increasing the likelihood of overfitting [ 28 ]. Therefore prior to classification, methods of balancing a numerous features vector, feature selection or extraction are studied to speed up the learning process and minimize the curse of dimensionality problem [ 29 , 30 ]. Emotion classification is generally performed using standard techniques such as SVM [ 31 , 32 , 33 ], various types of artificial neural networks (NN) [ 34 , 35 , 36 , 37 ], different types of the k-NN classifier [ 19 , 38 ] or using Hidden Markov Model (HMM) and its variations [ 39 ].…”

Section: Related Workmentioning

confidence: 99%

Emotional Speech Recognition Based on the Committee of Classifiers

Kamińska

2019

Entropy

View full text Add to dashboard Cite

This article presents the novel method for emotion recognition from speech based on committee of classifiers. Different classification methods were juxtaposed in order to compare several alternative approaches for final voting. The research is conducted on three different types of Polish emotional speech: acted out with the same content, acted out with different content, and spontaneous. A pool of descriptors, commonly utilized for emotional speech recognition, expanded with sets of various perceptual coefficients, is used as input features. This research shows that presented approach improve the performance with respect to a single classifier.

show abstract

Section: Related Workmentioning

confidence: 99%

Emotional Speech Recognition Based on the Committee of Classifiers

Kamińska

2019

Entropy

View full text Add to dashboard Cite

show abstract

“…Pfister and Robinson [ 30 ] proposed an emotion classification framework that consists of n(n-1)/2 pairwise SVMs for n labels, each with a differing set of features selected by the correlation-based feature selection algorithm. Arruti et al [ 32 ] used four machine learning paradigms (IB, ID3, C4.5, NB) and evolutionary algorithms to select feature subsets that noticeably optimize the automatic emotion recognition success rate. Schuller et al [ 24 ] combined SVMs, decision trees and Bayesian classifiers to yield higher classification accuracy.…”

Section: Related Workmentioning

confidence: 99%

“…In Table 1 , audio recognition accuracy percentages for the different types of utterances (depending on the language) are presented. It has also to be noted that several automatic emotion recognition systems have used the RekEmozio dataset in previous works, such as [ 32 , 42 ].…”

Section: Case Studymentioning

confidence: 99%

“…With regard to global features, statistics containing measures, such as the mean, variance, standard deviation and the maximum and minimum values and their positions, were computed, among others. The full set of the 123 features we used in the first phase of this work, including local characteristics and their correlated global statistics, were described in more detail in [ 32 ].…”

Section: Case Studymentioning

confidence: 99%

See 1 more Smart Citation

Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech

Álvarez

Sierra

Arruti

et al. 2015

Sensors

Self Cite

View full text Add to dashboard Cite

In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one.

show abstract

“…There is an avalanche of intrinsic socioeconomic advantages that make speech signals a good source for affective computing. They are economically easier to acquire than other biological signals like electroencephalogram, electrooculography and electrocardiograms [17], which makes speech emotion recognition research attractive [18]. Machine learning algorithms extract a set of speech features with a variety of transformations to appositely classify emotions into different classes.…”

Section: Introductionmentioning

confidence: 99%

Ensemble Learning of Hybrid Acoustic Features for Speech Emotion Recognition

Zvarevashe

Olugbara

2020

Algorithms

View full text Add to dashboard Cite

Automatic recognition of emotion is important for facilitating seamless interactivity between a human being and intelligent robot towards the full realization of a smart society. The methods of signal processing and machine learning are widely applied to recognize human emotions based on features extracted from facial images, video files or speech signals. However, these features were not able to recognize the fear emotion with the same level of precision as other emotions. The authors propose the agglutination of prosodic and spectral features from a group of carefully selected features to realize hybrid acoustic features for improving the task of emotion recognition. Experiments were performed to test the effectiveness of the proposed features extracted from speech files of two public databases and used to train five popular ensemble learning algorithms. Results show that random decision forest ensemble learning of the proposed hybrid acoustic features is highly effective for speech emotion recognition.

show abstract

Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction

Cited by 17 publications

References 47 publications

Emotional Speech Recognition Based on the Committee of Classifiers

Emotional Speech Recognition Based on the Committee of Classifiers

Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech

Ensemble Learning of Hybrid Acoustic Features for Speech Emotion Recognition

Contact Info

Product

Resources

About