Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1883
|View full text |Cite
|
Sign up to set email alerts
|

On Enhancing Speech Emotion Recognition Using Generative Adversarial Networks

Abstract: Generative Adversarial Networks (GANs) have gained a lot of attention from machine learning community due to their ability to learn and mimic an input data distribution. GANs consist of a discriminator and a generator working in tandem playing a min-max game to learn a target underlying data distribution; when fed with data-points sampled from a simpler distribution (like uniform or Gaussian distribution). Once trained, they allow synthetic generation of examples sampled from the target distribution. We invest… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
45
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 46 publications
(45 citation statements)
references
References 18 publications
0
45
0
Order By: Relevance
“…Automatic speech emotion recognition has begun to take advantage of these methods. Sahu et al [13] explored how GANs could be used to augment utterance-level features, while Chang et al [14] used DCGANs to improve performance on spectrograms.…”
Section: Adversarial Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Automatic speech emotion recognition has begun to take advantage of these methods. Sahu et al [13] explored how GANs could be used to augment utterance-level features, while Chang et al [14] used DCGANs to improve performance on spectrograms.…”
Section: Adversarial Methodsmentioning
confidence: 99%
“…More recently, speech research has followed the popularization of adversarial methods, including Generative Adversarial Networks (GANs) [11], [12], [13], [14], Wassersteain GANS (WGANS) [15], and CycleGANs [16], [17], [18]. However, many of these generative speech transfer models introduce noise and have a long way to go, as explored by Kaneko et al [18].…”
Section: Introductionmentioning
confidence: 99%
“…In a different way, we apply GANs based adversarial training to generate robust representations across domains (speaker to be specific) for speech emotion recognition. Among previous work on SER, GANs are mainly utilized to learn discriminative representations [13] and conduct data augmentation [14]. Our method is different in that we aim to disentangle speaker information and learn speaker-invariant representations for SER.…”
Section: Related Workmentioning
confidence: 99%
“…In [4,5,6,7], CNN and LSTM based models are explored from feature representations such as MFCC and OpenS-MILE [8] features. In [9,10,11,12], adversarial learning paradigm * Both the authors contributed equally to this paper is explored for robust recognition. In [13,14], transfer learning approach is explored.…”
Section: Introductionmentioning
confidence: 99%