Ten Recent Trends in Computational Paralinguistics

Schuller, Björn; Weninger, Felix

doi:10.1007/978-3-642-34584-5_3

Cited by 10 publications

(8 citation statements)

References 92 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In our experiments, we used such evaluation metrics as the per class Accuracy, Precision, Recall, and F1-score. Due to the unequal number of samples in each test class (unequal priors), we have analyzed the results using Unweighted Average Recall (UAR) for multiclass classifiers, closely related to the accuracy as a good or even better metric to optimize when the sample class ratio is imbalanced [72]. UAR is defined as the average across the diagonal of the confusion matrix.…”

Section: Evaluation Setupmentioning

confidence: 99%

“…The test set for automatic evaluation consists of 33 separate samples of acting emotional speech of Russian children used in perception tests [21]. Both classifiers were trained based on the eGeMAPS feature set [72].…”

Section: Comparison Of the Subjective Evaluation And Automatic Emotio...mentioning

confidence: 99%

“…of Russian children used in perception tests [21]. Both classifiers were trained based on the eGeMAPS feature set [72]. The results of automatic classification are shown in Tables 8 and 9.…”

Section: Comparison Of the Subjective Evaluation And Automatic Emotio...mentioning

confidence: 99%

See 2 more Smart Citations

Automatic Speech Emotion Recognition of Younger School Age Children

Matveev¹,

Matveev²,

Frolova³

et al. 2022

Mathematics

View full text Add to dashboard Cite

This paper introduces the extended description of a database that contains emotional speech in the Russian language of younger school age (8–12-year-old) children and describes the results of validation of the database based on classical machine learning algorithms, such as Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP). The validation is performed using standard procedures and scenarios of the validation similar to other well-known databases of children’s emotional acting speech. Performance evaluation of automatic multiclass recognition on four emotion classes “Neutral (Calm)—Joy—Sadness—Anger” shows the superiority of SVM performance and also MLP performance over the results of perceptual tests. Moreover, the results of automatic recognition on the test dataset which was used in the perceptual test are even better. These results prove that emotions in the database can be reliably recognized both by experts and automatically using classical machine learning algorithms such as SVM and MLP, which can be used as baselines for comparing emotion recognition systems based on more sophisticated modern machine learning methods and deep neural networks. The results also confirm that this database can be a valuable resource for researchers studying affective reactions in speech communication during child-computer interactions in the Russian language and can be used to develop various edutainment, health care, etc. applications.

show abstract

Section: Evaluation Setupmentioning

confidence: 99%

Section: Comparison Of the Subjective Evaluation And Automatic Emotio...mentioning

confidence: 99%

See 1 more Smart Citation

Automatic Speech Emotion Recognition of Younger School Age Children

Matveev¹,

Matveev²,

Frolova³

et al. 2022

Mathematics

View full text Add to dashboard Cite

show abstract

“…In such cases the quantization into a few categorical labels might lead to a loss in model representativeness [7]. In comparison with the categorical problem, only a few publications have addressed the dimensional recognition challenges, yet it has become a trend in the affective computing community [7], [40], [46], [47], [48], [49]. Some works approximated dimensional affect indicators with fine-grained quantization scales on segmented data, as in [42].…”

Section: Related Workmentioning

confidence: 99%

Dimensional Affect Recognition from HRV: An Approach Based on Supervised SOM and ELM

Bugnon

Calvo

Milone

2020

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

Dimensional affect recognition is a challenging topic and current techniques do not yet provide the accuracy necessary for HCI applications. In this work we propose two new methods. The first is a novel self-organizing model that learns from similarity between features and affects. This method produces a graphical representation of the multidimensional data which may assist the expert analysis. The second method uses extreme learning machines, an emerging artificial neural network model. Aiming for minimum intrusiveness, we use only the heart rate variability, which can be recorded using a small set of sensors. The methods were validated with two datasets. The first is composed of 16 sessions with different participants and was used to evaluate the models in a classification task. The second one was the publicly available Remote Collaborative and Affective Interaction (RECOLA) dataset, which was used for dimensional affect estimation. The performance evaluation used the kappa score, unweighted average recall and the concordance correlation coefficient. The concordance coefficient on the RECOLA test partition was 0.421 in arousal and 0.321 in valence. Results shows that our models outperform state-of-the-art models on the same data and provides new ways to analyze affective states.

show abstract

“…There is an increasing amount of research in that field [1][2][3] [4] and a number of Interspeech challenges in recent years have been organized with the intention to foster research in the many different aspects of paralanguage and to combine the sometimes scattered research efforts leveraging synergy effects [5].…”

Section: Introductionmentioning

confidence: 99%

Hierarchical neural networks and enhanced class posteriors for social signal classification

Brueckner

Schuller

2013

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

View full text Add to dashboard Cite

With the impressive advances of deep learning in recent years the interest in neural networks has resurged in the fields of automatic speech recognition and emotion recognition.In this paper we apply neural networks to address speakerindependent detection and classification of laughter and filler vocalizations in speech. We first explore modeling class posteriors with standard neural networks and deep stacked autoencoders. Then, we adopt a hierarchical neural architecture to compute enhanced class posteriors and demonstrate that this approach introduces significant and consistent improvements on the Social Signals Sub-Challenge of the Interspeech 2013 Computational Paralinguistics Challenge (ComParE). On this task we achieve a value of 92.4% of the unweighted average area-under-the-curve, which is the official competition measure, on the test set. This constitutes an improvement of 9.1% over the baseline and is the best result obtained so far on this task.Index Terms-enhanced posteriors, hierarchical neural networks, deep autoencoder networks, computational paralinguistics challenge

show abstract

Ten Recent Trends in Computational Paralinguistics

Cited by 10 publications

References 92 publications

Automatic Speech Emotion Recognition of Younger School Age Children

Automatic Speech Emotion Recognition of Younger School Age Children

Dimensional Affect Recognition from HRV: An Approach Based on Supervised SOM and ELM

Hierarchical neural networks and enhanced class posteriors for social signal classification

Contact Info

Product

Resources

About