Proceedings of the 24th ACM International Conference on Multimedia 2016
DOI: 10.1145/2964284.2967306
|View full text |Cite
|
Sign up to set email alerts
|

Spectral and Cepstral Audio Noise Reduction Techniques in Speech Emotion Recognition

Abstract: Signal noise reduction can improve the performance of machine learning systems dealing with time signals such as audio. Real-life applicability of these recognition technologies requires the system to uphold its performance level in variable, challenging conditions such as noisy environments. In this contribution, we investigate audio signal denoising methods in cepstral and log-spectral domains and compare them with common implementations of standard techniques. The di↵erent approaches are first compared gene… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 34 publications
(5 citation statements)
references
References 29 publications
0
5
0
Order By: Relevance
“…The main differences between MEMS- and dynamic-based recordings can be summarized as the former having a higher degree of background noise due to the omnidirectional nature of the MEMS microphone, and a different frequency response. In our study, a slight degree of noise cancellation was applied using an algorithm based on spectral subtraction, individually learning the noise profile of each audio recording [ 65 ]. For the frequency response, a pre-emphasis procedure was carried out mimicking the declared response of the Shure WH20; the response of an omnidirectional MEMS microphone can, conversely, be well approximated to being flat [ 66 ].…”
Section: Methodsmentioning
confidence: 99%
“…The main differences between MEMS- and dynamic-based recordings can be summarized as the former having a higher degree of background noise due to the omnidirectional nature of the MEMS microphone, and a different frequency response. In our study, a slight degree of noise cancellation was applied using an algorithm based on spectral subtraction, individually learning the noise profile of each audio recording [ 65 ]. For the frequency response, a pre-emphasis procedure was carried out mimicking the declared response of the Shure WH20; the response of an omnidirectional MEMS microphone can, conversely, be well approximated to being flat [ 66 ].…”
Section: Methodsmentioning
confidence: 99%
“…Audio emotion recognition (also called SER) detects the embedded emotions by processing and understanding speech signals [161]. Various ML-based and DL-based SER systems have been carried out on the basis of these extracted features for better analysis [162,163]. Traditional ML-based SER concentrates on the extraction of the acoustic features and the selection of the classifiers.…”
Section: Audio Emotion Recognitionmentioning
confidence: 99%
“…Critically, the accuracy will be affected by the presence of noise in the speech signal. Therefore, for reducing this noise, several noise reduction algorithms can be utilized, like minimum mean square error (MMSE) and log-spectral amplitude MMSE (LogMMSE) [35]. The crucial phases in emotion recognition are feature selection and dimension reduction.…”
Section: ) Noise Reductionmentioning
confidence: 99%