Speech Emotion Recognition under White Noise

Huang, Chih Hsuan; Chen, Guo-Ming; Yu, Hua; Yang, Bao; Zhao, Li

doi:10.2478/aoa-2013-0054

Cited by 38 publications

(15 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The developed methods fall into two main categories. The first one includes speech enhancement or noise reduction techniques, either by means of speech sample reconstruction [7], noise compensation using histogram equalization [8], adaptive thresholding in the wavelet domain for noise cancellation [9], or spectral subtraction [10]. However, the downside of such methods is that they strongly rely on prior knowledge of noise, speech, or both, which limits their implementations.…”

Section: Introductionmentioning

confidence: 99%

An Investigation of a Feature-Level Fusion for Noisy Speech Emotion Recognition

2019

View full text Add to dashboard Cite

Because one of the key issues in improving the performance of Speech Emotion Recognition (SER) systems is the choice of an effective feature representation, most of the research has focused on developing a feature level fusion using a large set of features. In our study, we propose a relatively low-dimensional feature set that combines three features: baseline Mel Frequency Cepstral Coefficients (MFCCs), MFCCs derived from Discrete Wavelet Transform (DWT) sub-band coefficients that are denoted as DMFCC, and pitch based features. Moreover, the performance of the proposed feature extraction method is evaluated in clean conditions and in the presence of several real-world noises. Furthermore, conventional Machine Learning (ML) and Deep Learning (DL) classifiers are employed for comparison. The proposal is tested using speech utterances of both of the Berlin German Emotional Database (EMO-DB) and Interactive Emotional Dyadic Motion Capture (IEMOCAP) speech databases through speaker independent experiments. Experimental results show improvement in speech emotion detection over baselines.

show abstract

Section: Introductionmentioning

confidence: 99%

An Investigation of a Feature-Level Fusion for Noisy Speech Emotion Recognition

2019

View full text Add to dashboard Cite

show abstract

“…These noises severely degrade the performance of systems and consequently affect the user experience in real-life conditions Tawari and Trivedi (2010); Schuller et al (2006); Huang et al (2013); Schuller et al (2011). Therefore, a fundamental step in this work is to investigate environmental noise reduction via a novel closed-form solution to the graph signal theory-based method, to reduce noise at features level and improve the performance of emotion prediction.…”

Section: Introductionmentioning

confidence: 99%

“…In Zhang et al (2018) supervised single-channel technique is applied to speech dereverberation and denoising. In Tawari and Trivedi (2010), the authors utilized a speech enhancement technique based on the adaptive thresholding in the wavelet domain to address noises while in Huang et al (2013), the authors studied the influence of additive white Gaussian noise on speakers emotion states via a Gaussian mixture model. However, these methods are performed on discrete emotion conditions.…”

Section: Introductionmentioning

confidence: 99%

A closed-form solution to the graph total variation problem for continuous emotion profiling in noisy environment

Jing

Mao

Chen

et al. 2018

Speech Communication

View full text Add to dashboard Cite

Time-continuous emotion estimation (e. g., arousal and valence) from spontaneous speech expressions has recently drawn increasing commercial attention.However, real-life applications of emotion recognition technology require challenging conditions, such as noise from recording devices and background environments. In this work, we introduce a novel personalized emotion prediction model validated in different noisy environments. It is performed by a three-level noise reduction algorithm: (i) data downsampling, (ii) feature synchronization, and (iii) a modified version of graph total variation. The

show abstract

“…Most studies use data collected in laboratory setups with little or no background noise or reverberation. The effect of noise on emotion prediction has been explored in [41,42,43], but little work has been done to study the effect of noise and/or reverberation on depression data.…”

Section: Introductionmentioning

confidence: 99%

Noise and reverberation effects on depression detection from speech

Mitra

Tsiartas

Shriberg

2016

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Speech-based depression detection has gained importance in recent years, but most research has used relatively quiet conditions or examined a single corpus per study. Little is thus known about the robustness of speech cues in the wild. This study compares the effect of noise and reverberation on depression prediction using 1) standard mel-frequency cepstral coefficients (MFCCs), and 2) features designed for noise robustness, damped oscillator cepstral coefficients (DOCCs). Data come from the 2014 Audio-Visual Emotion Recognition Challenge (AVEC). Results using additive noise and reverberation reveal a consistent pattern of findings for multiple evaluation metrics under both matched and mismatched conditions. First and most notably: standard MFCC features suffer dramatically under test/train mismatch for both noise and reverberation; DOCC features are far more robust. Second, including higher-order cepstral coefficients is generally beneficial. Third, artificial neural networks tend to outperform support vector regression. Fourth, spontaneous speech appears to offer better robustness than read speech. Finally, a cross-corpus (and crosslanguage) experiment reveals better noise and reverberation robustness for DOCCs than for MFCCs. Implications and future directions for real-world robust depression detection are discussed.

show abstract

Speech Emotion Recognition under White Noise

Cited by 38 publications

References 25 publications

An Investigation of a Feature-Level Fusion for Noisy Speech Emotion Recognition

An Investigation of a Feature-Level Fusion for Noisy Speech Emotion Recognition

A closed-form solution to the graph total variation problem for continuous emotion profiling in noisy environment

Noise and reverberation effects on depression detection from speech

Contact Info

Product

Resources

About