Ronald Müller scite author profile

Automatic detection of the level of human interest is of high relevance for many technical applications, such as automatic customer care or tutoring systems. However, the recognition of spontaneous interest in natural conversations independently of the subject remains a challenge. Identification of human affective states relying on single modalities only is often impossible, even for humans, since different modalities contain partially disjunctive cues. Multimodal approaches to human affect recognition generally are shown to boost recognition performance, yet are evaluated in restrictive laboratory settings only. Herein we introduce a fully automatic processing combination of Active-Appearance-Model-based facial expression, vision-based eyeactivity estimation, acoustic features, linguistic analysis, non-linguistic vocalisations, and temporal context information in an early feature fusion process. We provide detailed subject-independent results for classification and regression of the Level of Interest using Support-Vector Machines on an audiovisual interest corpus (AV IC) consisting of spontaneous, conversational speech demonstrating "theoretical" effectiveness of the approach. Further, to evaluate the approach with regards to real-life usability a user-study is conducted for proof of "practical" effectiveness.

show abstract

Speaker Independent Speech Emotion Recognition by Ensemble Classification

Schuller

Reiter

Müller

et al.

119

View full text Add to dashboard Cite

Somatic Development in Children with Congenital Heart Defects

Poryo

Paes

Pickardt

et al. 2018

The Journal of Pediatrics

View full text Add to dashboard Cite

Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles

Schuller¹,

Müller²,

Lang³

et al. 2005

View full text Add to dashboard Cite

Herein we present a comparison of novel concepts for a robust fusion of prosodic and verbal cues in speech emotion recognition. Thereby 276 acoustic features are extracted out of a spoken phrase. For linguistic content analysis we use the Bag-of-Words text representation. This allows for integration of acoustic and linguistic features within one vector prior to a final classification. Extensive feature selection by filter-and wrapper based methods is fulfilled. Likewise optimal sets via SVM-SFFS and single feature relevance by information gain ratio calculation are presented. Overall classification is realised by diverse ensemble approaches. Among base classifiers Kernel Machines, Decision Trees, Bayesian classifiers, and memory-based learners are found. Acoustics only tests ran on a database comprising 39 speakers for speaker independent accuracy analysis. Additionally the public Berlin Emotional Speech database is used. A further database of 4,221 movie related phrases forms the basis of acoustic and linguistic information analysis evaluation. Overall remarkable performance in the discrimination of seven discrete emotions could be observed.

show abstract

Automatic Multi-Modal Meeting Camera Selection for Video-Conferences and Meeting Browsers

Al-Hames

Hornler

Müller

et al. 2007

View full text Add to dashboard Cite

In a video-conference the participants usually see the video of the speaker. However if somebody reacts (e. g. nodding) the system should switch to his video. Current systems do not support this. We formulate this camera selection as a pattern recognition problem. Then we apply HMMs to learn this behaviour. Thus our system can easily be adapted to different meeting scenarios. Furthermore, while current systems stay on the speaker, our system will switch if somebody reacts. In an experimental section we show that -compared to a desired output -a current system shows the wrong camera more than half of the time (frame error rate 53%), where our system selects the wrong camera in only a quarter of the time (FER 27%).

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ronald Müller

Being bored? Recognising natural interest by extensive audiovisual integration for real-life application

Speaker Independent Speech Emotion Recognition by Ensemble Classification

Somatic Development in Children with Congenital Heart Defects

Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles

Automatic Multi-Modal Meeting Camera Selection for Video-Conferences and Meeting Browsers

Contact Info

Product

Resources

About