2017
DOI: 10.3390/s17071694
|View full text |Cite
|
Sign up to set email alerts
|

Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN

Abstract: Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
57
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 111 publications
(57 citation statements)
references
References 27 publications
0
57
0
Order By: Relevance
“…These high-level features are fed to the SVM classifier, which is connected to each DBN to predict the speaker emotion and then makes decisions based on majority voting. Lian et al [26] utilized a complicated model, DBN used for features learning to get hidden features from speech and SVM classifier was utilized for emotion prediction to achieve high-level accuracy in SER using CASIA Chinese dataset. Hajar and Hasan [27] proposed a method for SER, to split the speech signals into frames and extracted the MFCCs feature as well as converted them into spectrograms for selecting the keyframe as a whole audio, which represent the utterance of speech.…”
Section: Convolutional Neural Network (Cnn)-based Sermentioning
confidence: 99%
See 1 more Smart Citation
“…These high-level features are fed to the SVM classifier, which is connected to each DBN to predict the speaker emotion and then makes decisions based on majority voting. Lian et al [26] utilized a complicated model, DBN used for features learning to get hidden features from speech and SVM classifier was utilized for emotion prediction to achieve high-level accuracy in SER using CASIA Chinese dataset. Hajar and Hasan [27] proposed a method for SER, to split the speech signals into frames and extracted the MFCCs feature as well as converted them into spectrograms for selecting the keyframe as a whole audio, which represent the utterance of speech.…”
Section: Convolutional Neural Network (Cnn)-based Sermentioning
confidence: 99%
“…In the literature, many methods used the CNN model for SER using different types of input, to extract discriminative features from speech signals [9]. In [26][27][28][29][30][31][32] utilized the deep learning approaches for SER to improve the recognition ratio for real-time spontaneous SER using different speech datasets, IEMOCAP, SAVEE, RAVDESS, CAISE TITMIT, etc. To increase the accuracy but the cost computations of the model is also increased due to usage of large pre-trained CNN architectures.…”
Section: Convolutional Neural Network (Cnn)-based Sermentioning
confidence: 99%
“…Zhu et al [46] used a combination of acoustic features based on MFCC, pitch, formant, short-term ZCR and short-term energy to recognize speech emotion. They extracted the most discriminating features and performed classification using the deep belief network (DBN) with SVM.…”
Section: Related Studiesmentioning
confidence: 99%
“…Torres-Valencia et al [12] classified two-dimensional emotions by using HMM, and studies using the C4.5 Decision Tree [13], K-nearest neighbor (KNN) [14,15], and Linear Discriminant Analysis (LDA) [16] have been reported. There have also been studies using deep-learning methods, such as convolutional neural networks (CNNs) [17,18], Deep Belief Network (DBN) [19], and Sparse AE [20], or models integrating machine learning and deep learning [21,22]. In addition to the simple categorization of emotions, studies that apply classified models to various fields have been actively conducted [23][24][25][26].…”
Section: Introductionmentioning
confidence: 99%