2017
DOI: 10.1109/taffc.2016.2531664
|View full text |Cite
|
Sign up to set email alerts
|

Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models

Abstract: Abstract-Automatic emotion recognition from speech has been recently focused on the prediction of time-continuous dimensions (e.g., arousal and valence) of spontaneous and realistic expressions of emotion, as found in real-life interactions. However, the automatic prediction of such emotions poses several challenges, such as the subjectivity found in the definition of a gold standard from a pool of raters and the issue of data scarcity in training models. In this work, we introduce a novel emotion recognition … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
28
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 47 publications
(29 citation statements)
references
References 50 publications
1
28
0
Order By: Relevance
“…ii) compared with the use of SYNC or GTVR alone, the combination of SYNC and GTVR (SGP) can improve the system's ability to predict emotions, which illustrates that it is important to reduce the annotation drift and environmental noise before predicting emotions; iii) the p-values of a paired t-test between the CCC values obtained on those three approaches show that the combined approach outperforms the performance reached by each single component; iv) in most of the cases, we obtain p < 0.001 between pairwise comparative methods for each dimension, verifying the effectiveness of the GTVR method. Moreover, the results illustrate that the proposed GTVR method is more than a mere denoising approach, but rather a signal approximator able to extract the relevant information from the underlying signal; v) consistently with the literature Mencattini et al (2017), the results show that arousal is better identified than valence. As an example, Fig.…”
Section: A N U S C R I P Tsupporting
confidence: 78%
See 1 more Smart Citation
“…ii) compared with the use of SYNC or GTVR alone, the combination of SYNC and GTVR (SGP) can improve the system's ability to predict emotions, which illustrates that it is important to reduce the annotation drift and environmental noise before predicting emotions; iii) the p-values of a paired t-test between the CCC values obtained on those three approaches show that the combined approach outperforms the performance reached by each single component; iv) in most of the cases, we obtain p < 0.001 between pairwise comparative methods for each dimension, verifying the effectiveness of the GTVR method. Moreover, the results illustrate that the proposed GTVR method is more than a mere denoising approach, but rather a signal approximator able to extract the relevant information from the underlying signal; v) consistently with the literature Mencattini et al (2017), the results show that arousal is better identified than valence. As an example, Fig.…”
Section: A N U S C R I P Tsupporting
confidence: 78%
“…With the development of artificial intelligence, humans have demanded more and more from affective computing, which facilitates the development of an increasing number of automatic speech emotion recognition (SER) applications and more relevantly dimensional emotion prediction from timecontinuous labels Gunes and Schuller (2013); Mencattini et al (2017); Martinelli et al (2016); Mariooryad and Busso (2015). Automatic emotion recognition (AER) technology from speech has matured well enough to be applied in some real-life scenarios Vignolo et al (2016), such as call centers Chen et al (2012), disease auxiliary diagnosis Schuller et al (2015), remote education and safe driving.…”
Section: Introductionmentioning
confidence: 99%
“…(3) maximizes the mutual information between each feature 268 and the expected output, and it is also called Relevance Term. Further details can be found in [1,40]. 269…”
Section: Physical Multimodal Sensors and Data Recording 137mentioning
confidence: 99%
“…Several studies show that quantitative measurements of human expression can be used to estimate 48 psychological and physical conditions in humans [1,2,3]. Artificial empathic systems are expected to be of 49 benefit in many diverse domains such a precision medicine, personalized care and therapy, customer 50 satisfaction studies, or web profiling, to mention a few [4][5][6].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation