2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 2016
DOI: 10.1109/apsipa.2016.7820699
|View full text |Cite
|
Sign up to set email alerts
|

Speech emotion recognition using convolutional and Recurrent Neural Networks

Abstract: Speech Emotion Recognition (SER) plays an important role in human-computer interface and assistant technologies. In this paper, a new method is proposed using distributed Convolution Neural Networks (CNN) to automatically learn affect-salient features from raw spectral information, and then applying Bidirectional Recurrent Neural Network (BRNN) to obtain the temporal information from the output of CNN. In the end, an Attention Mechanism is implemented on the output sequence of the BRNN to focus on target emoti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
127
1
8

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 306 publications
(138 citation statements)
references
References 8 publications
2
127
1
8
Order By: Relevance
“…SER-LSTM (Lim et al, 2016) is a model that uses recurrent neural networks on top of convolution operations on spectrogram of audio.…”
Section: Acoustic Sentiment Analysismentioning
confidence: 99%
“…SER-LSTM (Lim et al, 2016) is a model that uses recurrent neural networks on top of convolution operations on spectrogram of audio.…”
Section: Acoustic Sentiment Analysismentioning
confidence: 99%
“…Similarly, Zhang et al [85] used an AlexNet to add increasingly more temporal context to learned feature representations. It is worth noting that while some studies directly used CNN features for affect prediction [142], [128], others combined local temporal modeling with global temporal modeling via RNN [108], [163], or pooling approaches [85].…”
Section: Learning Temporal Features For Sermentioning
confidence: 99%
“…In [1,2,3] feature learning from raw-waveform or spectrogram using CNN, LSTM based models is explored. In [4,5,6,7], CNN and LSTM based models are explored from feature representations such as MFCC and OpenS-MILE [8] features. In [9,10,11,12], adversarial learning paradigm * Both the authors contributed equally to this paper is explored for robust recognition.…”
Section: Introductionmentioning
confidence: 99%