Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing appropriate comparison of results across studies and potential integration and combination of extraction and recognition systems. In this paper we propose a basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis. In contrast to a large brute-force parameter set, we present a minimalistic set of voice parameters here. These were selected based on a) their potential to index affective physiological changes in voice production, b) their proven value in former studies as well as their automatic extractability, and c) their theoretical significance. The set is intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Our implementation is publicly available with the openSMILE toolkit. Comparative evaluations of the proposed feature set and large baseline feature sets of INTERSPEECH challenges show a high performance of the proposed set in relation to its size.
Little attention has been paid so far to physiological signals for emotion recognition compared to audiovisual emotion channels such as facial expression or speech. This paper investigates the potential of physiological signals as reliable channels for emotion recognition. All essential stages of an automatic recognition system are discussed, from the recording of a physiological dataset to a feature-based multiclass classification. In order to collect a physiological dataset from multiple subjects over many weeks, we used a musical induction method which spontaneously leads subjects to real emotional states, without any deliberate lab setting. Four-channel biosensors were used to measure electromyogram, electrocardiogram, skin conductivity and respiration changes. A wide range of physiological features from various analysis domains, including time/frequency, entropy, geometric analysis, subband spectra, multiscale entropy, etc., is proposed in order to find the best emotion-relevant features and to correlate them with emotional states. The best features extracted are specified in detail and their effectiveness is proven by classification results. Classification of four musical emotions (positive/high arousal, negative/high arousal, negative/low arousal, positive/low arousal) is performed by using an extended linear discriminant analysis (pLDA). Furthermore, by exploiting a dichotomic property of the 2D emotion model, we develop a novel scheme of emotion-specific multilevel dichotomous classification (EMDC) and compare its performance with direct multiclass classification using the pLDA. Improved recognition accuracy of 95\% and 70\% for subject-dependent and subject-independent classification, respectively, is achieved by using the EMDC scheme.
Little attention has been paid so far to physiological signals for emotion recognition compared to audio-visual emotion channels, such as facial expressions or speech. In this paper, we discuss the most important stages of a fully implemented emotion recognition system including data analysis and classification. For collecting physiological signals in different affective states, we used a music induction method which elicits natural emotional reactions from the subject. Four-channel biosensors are used to obtain electromyogram, electrocardiogram, skin conductivity and respiration changes. After calculating a sufficient amount of features from the raw signals, several feature selection/reduction methods are tested to extract a new feature set consisting of the most significant features for improving classification performance. Three well-known classifiers, linear discriminant function, k-nearest neighbour and multilayer perceptron, are then used to perform supervised classification.
Discusses some of the key issues that must be addressed in creating virtual humans, or androids. As a first step, we overview the issues and available tools in three key areas of virtual human research: face-to-face conversation, emotions and personality, and human figure animation. Assembling a virtual human is still a daunting task, but the building blocks are getting bigger and better every day. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it. Building a virtual human is a multidisciplinary effort, joining traditional artificial intelligence problems with a range of issues from computer graphics to social science. Virtual humans must act and react in their simulated environment, drawing on the disciplines of automated reasoning and planning. To hold a conversation, they must exploit the full gamut of natural language processing research, from speech recognition and natural language understanding to natural language generation and speech synthesis. Providing human bodies that can be controlled in real time delves into computer graphics and animation. And because an agent looks like a human, people expect it to behave like one as well and will be disturbed by, or misinterpret, discrepancies from human norms. Thus, virtual human research must draw heavily on psychology and communication theory to appropriately convey nonverbal behavior, emotion, and personality. Comments Author(s)Jonathan GratchThis broad range of requirements poses a serious problem. Researchers working on particular aspects of virtual humans cannot explore their component in the context of a complete virtual human unless they can understand results across this array of disciplines and assemble the vast range of software tools (for example, speech recognizers, planners, and animation systems) required to construct one. Moreover, these tools were rarely designed to interoperate and, worse, were often designed with different purposes in mind. For example, most computer graphics research has focused on high fidelity offline image rendering that does not support the fine-grained interactive control that a virtual human must have over its body.In the spring of 2002, about 30 international researchers from across disciplines convened at the University of Southern California to begin to bridge this gap in knowledge and tools (see www.ict.usc.edu/~vhumans). Our ultimate goal is a modular architecture and interface standards that will allow researchers in this area to reuse each other's work. This goal can only be achieve...
We present a data-mining experiment on feature selection for automatic emotion recognition. Starting from more than 1000 features derived from pitch, energy and MFCC time series, the most relevant features in respect to the data are selected from this set by removing correlated features. The features selected for acted and realistic emotions are analysed and show significant differences. All features are computed automatically and we also contrast automatically with manually units of analysis. A higher degree of automation did not prove to be a disadvantage in terms of recognition accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.