2017 4th International Conference on Signal Processing and Integrated Networks (SPIN) 2017
DOI: 10.1109/spin.2017.8049931
|View full text |Cite
|
Sign up to set email alerts
|

Speech emotion recognition with deep learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
31
2

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 80 publications
(34 citation statements)
references
References 12 publications
1
31
2
Order By: Relevance
“…A black dot (•) in a cell means the corresponding database was used in the research mentioned at the bottom of the column. Year 2005 2010 2011 2013 2014 2016 2017 2018 2019 2020 Research HMM, SVM [6] SVM [17] GerDA, RBM [22] LSTM, BLSTM [28] CRF, CRBM [24] SVM, PCA, LPP, TSL [90] DNN, ANN, ELM [23] DCNN, LSTM [29] CNN [21] DCNN [26] LSTM, MTL [33] ANN, PSOF [19] DCNN, DTPM, TSL [25] LSTM, VAE [31] GAN [86] GAN, SVM [88] LSTM, ATTN [94] DCNN, LSTM [30] CNN, VAE, DAE, AAE, AVB [32] DCNN, GAN [89] LDA, TSL, TLSL [91] CNN, BLSTM, ATTN, MTL [95] LSTM, ATTN [83] DNN, Generative [76] DCNN [79] Additionally, Figure 2a shows a comparison between accuracies reported in deep learning methods based on EMO-DB versus IEMOCAP, which we can see there is a clear separation between the accuracies published. Again, one reason could be the fact that EMO-DB has one degree of magnitude fewer number of samples than IEMOCAP, and using it with deep learning methods makes it more prone to overfitting.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…A black dot (•) in a cell means the corresponding database was used in the research mentioned at the bottom of the column. Year 2005 2010 2011 2013 2014 2016 2017 2018 2019 2020 Research HMM, SVM [6] SVM [17] GerDA, RBM [22] LSTM, BLSTM [28] CRF, CRBM [24] SVM, PCA, LPP, TSL [90] DNN, ANN, ELM [23] DCNN, LSTM [29] CNN [21] DCNN [26] LSTM, MTL [33] ANN, PSOF [19] DCNN, DTPM, TSL [25] LSTM, VAE [31] GAN [86] GAN, SVM [88] LSTM, ATTN [94] DCNN, LSTM [30] CNN, VAE, DAE, AAE, AVB [32] DCNN, GAN [89] LDA, TSL, TLSL [91] CNN, BLSTM, ATTN, MTL [95] LSTM, ATTN [83] DNN, Generative [76] DCNN [79] Additionally, Figure 2a shows a comparison between accuracies reported in deep learning methods based on EMO-DB versus IEMOCAP, which we can see there is a clear separation between the accuracies published. Again, one reason could be the fact that EMO-DB has one degree of magnitude fewer number of samples than IEMOCAP, and using it with deep learning methods makes it more prone to overfitting.…”
Section: Discussionmentioning
confidence: 99%
“…Another point reviewing accuracies and feature sets reported in Table 4 is that there is no apparent relationship between the complexity of the feature set and the accuracies reported, and the proposed methods have a significant role in the results. Incorporating similar databases Harar et al [ 26 ], using EMO-DB, the feature set is just PCM samples of the wav file, and the accuracy is 96.97%. On the other hand, Song et al [ 90 ], with a complex feature set, have reported an accuracy of 59.8%.…”
Section: Discussion and Conclusionmentioning
confidence: 99%
See 2 more Smart Citations
“…Real-time facial emotion recognition is done through RGB image classification using transfer learning methodologies in which knowledge gained from solving one problem and that is implemented for the another problem [19]. Emotion has been recognized from facial expressions using hidden markov models and deep belief networks with unweighted average recall (UAR) of about 56.36%(~) [20]. Different image types and emotions were examined for detecting expressions from the facial expressions using different classifiers such as KNN, HMM, GMM, SVM [21].…”
Section: Literature Surveymentioning
confidence: 99%