Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1811
|View full text |Cite
|
Sign up to set email alerts
|

Towards Robust Speech Emotion Recognition Using Deep Residual Networks for Speech Enhancement

Abstract: The use of deep learning (DL) architectures for speech enhancement has recently improved the robustness of voice applications under diverse noise conditions. These improvements are usually evaluated based on the perceptual quality of the enhanced audio or on the performance of automatic speech recognition (ASR) systems. We are interested instead in the usefulness of these algorithms in the field of speech emotion recognition (SER), and specifically in whether an enhancement architecture can effectively remove … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
33
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 48 publications
(34 citation statements)
references
References 24 publications
1
33
0
Order By: Relevance
“…The authors in [45] used 1D and 2D CNN-LSTM networks to identify speech emotions. The authors in [40] analyzed the effect noise removal techniques have on SER systems. The authors in [11] performed transfer learning and multi-task learning experiments and found that traditional machine learning models may function as well as deep learning models [2,41] for speech emotion recognition given the researchers choose the right input feature.…”
Section: Related Workmentioning
confidence: 99%
“…The authors in [45] used 1D and 2D CNN-LSTM networks to identify speech emotions. The authors in [40] analyzed the effect noise removal techniques have on SER systems. The authors in [11] performed transfer learning and multi-task learning experiments and found that traditional machine learning models may function as well as deep learning models [2,41] for speech emotion recognition given the researchers choose the right input feature.…”
Section: Related Workmentioning
confidence: 99%
“…Speech emotion recognition is considered a challenging task in the HCI domain. A large number of methodologies and corpora have been proposed in previous works [10][11] [12]. The early stage of SER research used handcrafted speech features and low-level descriptors to train classic machine learning models.…”
Section: Related Workmentioning
confidence: 99%
“…Despite the significant progress in Speech Emotion Recognition (SER) through Deep Neural Networks (DNNs), SER systems still perform poorly in noisy environments [1,2], and when the imperceptible adversarial perturbation is added to test examples [3]. The performance of state-of-the-art SER also degrades in the cross-corpus setting when an acoustic mismatch between training and testing exists [4].…”
Section: Introductionmentioning
confidence: 99%
“…This shows that SER systems lack robustness and generalisation which makes them susceptible to unknown test data shifts. Researchers have developed various methods to improve the performance of SER in noisy environment [2,5] and cross-corpus setting [6], however, significant performance improvement is still required.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation