More than a decade has passed since research on automatic recognition of emotion from speech has become a new field of research in line with its 'big brothers' speech and speaker recognition. This article attempts to provide a short overview on where we are today, how we got there and what this can reveal us on where to go next and how we could arrive there. In a first part, we address the basic phenomenon reflecting the last fifteen years, commenting on databases, modelling and annotation, the unit of analysis and prototypicality. We then shift to automatic processing including discussions on features, classification, robustness, evaluation, and implementation and system integration. From there we go to the first comparative challenge on emotion recognition from speech -the INTERSPEECH 2009 Emotion Challenge, organised by (part of) the authors, including the description of the Challenge's database, Sub-Challenges, participants and their approaches, the winners, and the fusion of results to the actual learnt lessons before we finally address the ever-lasting problems and future promising attempts.Keywords: emotion, affect, automatic classification, feature types, feature selection, noise robustness, adaptation, standardisation, usability, evaluation
Setting the SceneThis special issue will address new approaches towards dealing with the processing of realistic emotions in speech, and this overview article will give an account of the state-of-the-art, of the lacunas in this field, and of promising approaches towards overcoming shortcomings in modelling and recognising realistic emotions. We will also report on the first emotion challenge at INTERSPEECH 2009, constituting the initial impetus of this special issue; to end with, we want to sketch future strategies and applications, trying to answer the question 'Where to go from here?'The article is structured as follows: we first deal with the basic phenomenon briefly reflecting the last fifteen years, commenting on databases, modelling and annotation, the unit of analysis and prototypicality. We then proceed to automatic processing (sec. 2) including discussions on features, classification, robustness, evaluation, and implementation and system integration. From there we go to the the first Emotion Challenge (sec. 3) including the description of the Challenge's database, Sub-Challenges, participants and their approaches, the winners, and the fusion of results to the lessons learnt, before concluding this article (sec. 4).