On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition

Kačur, Juraj; Puterka, Boris; Pavlovičová, Jarmila; Oravec, Milos

doi:10.3390/s21051888

Cited by 18 publications

(9 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A set of acoustic features needs to be determined for every SER application. Although many sets have been proposed and many studies agree on using specific domains, namely energy, pitch, prosody and cepstrum [ 51 ], the cross-linguistic nature of this study and the need for generalization called for a wide, non-standard set of features to then be reduced. The feature set of choice comes from the INTERSPEECH 2013 library [ 52 ], embedded in the feature extraction tool OpenSMILE (by Audeering) [ 53 ].…”

Section: Methodsmentioning

confidence: 99%

The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning

Costantini

Parada-Cabaleiro

Casali

et al. 2022

Sensors

View full text Add to dashboard Cite

Machine Learning (ML) algorithms within a human–computer framework are the leading force in speech emotion recognition (SER). However, few studies explore cross-corpora aspects of SER; this work aims to explore the feasibility and characteristics of a cross-linguistic, cross-gender SER. Three ML classifiers (SVM, Naïve Bayes and MLP) are applied to acoustic features, obtained through a procedure based on Kononenko’s discretization and correlation-based feature selection. The system encompasses five emotions (disgust, fear, happiness, anger and sadness), using the Emofilm database, comprised of short clips of English movies and the respective Italian and Spanish dubbed versions, for a total of 1115 annotated utterances. The results see MLP as the most effective classifier, with accuracies higher than 90% for single-language approaches, while the cross-language classifier still yields accuracies higher than 80%. The results show cross-gender tasks to be more difficult than those involving two languages, suggesting greater differences between emotions expressed by male versus female subjects than between different languages. Four feature domains, namely, RASTA, F0, MFCC and spectral energy, are algorithmically assessed as the most effective, refining existing literature and approaches based on standard sets. To our knowledge, this is one of the first studies encompassing cross-gender and cross-linguistic assessments on SER.

show abstract

Section: Methodsmentioning

confidence: 99%

The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning

Costantini

Parada-Cabaleiro

Casali

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…Our proposed method may also result in errors owing to various noise environments. To overcome this problem, we aimed to reduce the number of features in the dataset by creating new features from existing features [ 69 ]. Since overfitting was one of the main issues for training different models during the competition, enriching the training data by adding data samples from different resources could be a possible solution for improving the results.…”

Section: Limitationsmentioning

confidence: 99%

Improved Feature Parameter Extraction from Speech Signals Using Machine Learning Algorithm

Abdusalomov

Safarov

Rakhimov

et al. 2022

Sensors

View full text Add to dashboard Cite

Speech recognition refers to the capability of software or hardware to receive a speech signal, identify the speaker’s features in the speech signal, and recognize the speaker thereafter. In general, the speech recognition process involves three main steps: acoustic processing, feature extraction, and classification/recognition. The purpose of feature extraction is to illustrate a speech signal using a predetermined number of signal components. This is because all information in the acoustic signal is excessively cumbersome to handle, and some information is irrelevant in the identification task. This study proposes a machine learning-based approach that performs feature parameter extraction from speech signals to improve the performance of speech recognition applications in real-time smart city environments. Moreover, the principle of mapping a block of main memory to the cache is used efficiently to reduce computing time. The block size of cache memory is a parameter that strongly affects the cache performance. In particular, the implementation of such processes in real-time systems requires a high computation speed. Processing speed plays an important role in speech recognition in real-time systems. It requires the use of modern technologies and fast algorithms that increase the acceleration in extracting the feature parameters from speech signals. Problems with overclocking during the digital processing of speech signals have yet to be completely resolved. The experimental results demonstrate that the proposed method successfully extracts the signal features and achieves seamless classification performance compared to other conventional speech recognition algorithms.

show abstract

“…Literature [22] puts forward that the semantic inclination of each sentence in the text can be obtained by weight first calculation method, emotional education is discussed on the combination of emotional words in college physical education, and the concept of the headword is put forward to calculate the inclination of words, which lays a foundation for more complicated emotional analysis of the text. In literature [23] through the big data analysis method, physical education teachers in colleges and universities should make full use of the advantages of disciplines in emotional education, pay attention to active and healthy emotional communication with students, win the respect and cooperation of students, and obtain the best educational effect. Literature [24] shows that the grades of praise and disapproval in college physical education can be divided into three categories (positive emotion, negative emotion, and neutral emotion) by star rating index, and the polarity classification of emotional education in comment text is completed by using the experimental algorithm using three classification methods, among which the method of the support-vector machine gets higher accuracy.…”

Section: Related Workmentioning

confidence: 99%

Analysis on the Penetration of Emotional Education in College Physical Education Based on Emotional Feature Clustering

Guo

Wang

2022

Scientific Programming

View full text Add to dashboard Cite

Physical education is a highly skilled education offered in colleges and universities. Teachers do not appear in front of inanimate machines as laborers, and they are not the same as gardeners who grow colorful trees, according to their essential characteristics. Their work is aimed at flesh-and-blood students who are sentimental, thoughtful, and engaged in critical thinking. As a result, schools should prioritize physical education and place a premium on emotional infiltration education, improve learning interest, improve teacher-student relationships, create a harmonious teaching environment, and improve teaching quality; it has a significant impact on an individual’s entire life. The modern educational process places a premium on the transmission of rational knowledge while overlooking the accumulation of emotional experience. The cultivation and development of emotional feeling ability, emotional expression, and expression ability receive less attention than the training and improvement of language, concept, logic, and reasoning abilities. Emotion feature clustering is used to propose an emotion recognition method in this study. This method generates extended features for classification by constructing a co-occurrence matrix based on the co-occurrence relationship of emotion features and then by applying the spectral clustering method. The binary value of whether emotional features of emotional education in college physical education appear in a particular cluster is then expressed as a feature and extended to the original training feature set, alleviating the problem of sparse features.

show abstract

On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition

Cited by 18 publications

References 41 publications

The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning

The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning

Improved Feature Parameter Extraction from Speech Signals Using Machine Learning Algorithm

Analysis on the Penetration of Emotional Education in College Physical Education Based on Emotional Feature Clustering

Contact Info

Product

Resources

About