In this paper, experiments on the automatic discrimination of basic emotions from spoken Finnish are described. For the purpose of the study, a large emotional speech corpus of Finnish was collected; 14 professional actors acted as speakers, and simulated four primary emotions when reading out a semantically neutral text. More than 40 prosodic features were derived and automatically computed from the speech samples. Two application scenarios were tested: the first scenario was speaker-independent for a small domain of speakers while the second scenario was completely speaker-independent. Human listening experiments were conducted to assess the perceptual adequacy of the emotional speech samples. Statistical classification experiments indicated that, with the optimal combination of prosodic feature vectors, automatic emotion discrimination performance close to human emotion recognition ability was achievable.
Asperger's syndrome (AS) belongs to the group of autism spectrum disorders and is characterized by deficits in social interaction, as manifested e.g. by the lack of social or emotional reciprocity. The disturbance causes clinically significant impairment in social interaction. Abnormal prosody has been frequently identified as a core feature of AS. There are virtually no studies on recognition of basic emotions from speech. This study focuses on how adolescents with AS (n=12) and their typically developed controls (n=15) recognize the basic emotions happy, sad, angry, and 'neutral' from speech prosody. Adolescents with AS recognized basic emotions from speech prosody as well as their typically developed controls did. Possibly the recognition of basic emotions develops during the childhood.
The aim of this investigation is to study how well voice quality conveys emotional content that can be discriminated by human listeners and the computer. The speech data were produced by nine professional actors (four women, five men). The speakers simulated the following basic emotions in a unit consisting of a vowel extracted from running Finnish speech: neutral, sadness, joy, anger, and tenderness. The automatic discrimination was clearly more successful than human emotion recognition. Human listeners thus apparently need longer speech samples than vowel-length units for reliable emotion discrimination than the machine, which utilizes quantitative parameters effectively for short speech samples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.