This work has been supported by the European H2020 program trough the EMPATHIC project and the MENHIR MSCA action under grants 769872 and 823907 respectively.
In this work, a Spanish corpus that was developed, within the EMPATHIC project 1 framework, is presented. It was designed for building a dialogue system capable of talking to elderly people and promoting healthy habits, through a coaching model. The corpus, that comprises audio, video an text channels, was acquired by using a Wizard of Oz strategy. It was annotated in terms of different labels according to the different models that are needed in a dialogue system, including an emotion based annotation that will be used to generate empathetic system reactions. The annotation at different levels along with the employed procedure are described and analysed.1 http://www.empathic-project.eu/
Developing accurate emotion recognition systems requires extracting suitable features of these emotions. In this paper, we propose an original approach of parameters extraction based on the strong, theoretical and empirical, correlation between the emotion categories and the dimensional emotions parameters. More precisely, acoustic features and dimensional emotion parameters are combined for better speech emotion characterisation. The procedure consists in developing arousal and valence models by regression on the training data and estimating, by classification, their values in the test data. Hence, when classifying an unknown sample into emotion categories, these estimations could be integrated into the feature vectors. It is noted that the results using this new set of parameters show a significant improvement of the speech emotion recognition performance.
This work is aimed to contrast the similarities and differences for the emotions identified in two very different scenarios: human-to-human interaction on Spanish TV debates and human-machine interaction with a virtual agent in Spanish. To this end we developed a crowd annotation procedure to label the speech signal in terms of both, emotional categories and Valence-Arousal-Dominance models. The analysis of these data showed interesting findings that allowed to profile both the speakers and the task. Then, Convolutional Neural Networks were used for the automatic classification of the emotional samples in both tasks. Experimental results drew up a different human behavior in both tasks and outlined different speaker profiles.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.