“…There are two key points that impact the performance of speech emotion recognition [ 4 , 5 , 6 , 7 , 8 ]: - The first is speech feature selection.Because there are many kinds of features that can be extracted from a speech sample, it is difficult to know which one should be chosen as the most suitable for emotion recognition. Some work [ 1 , 2 , 4 , 5 , 9 , 10 , 11 ] shows that prosody features (i.e., pitch, energy, Zero crossing rate) are important, other work [ 4 , 5 , 8 , 9 , 10 ] shows that quality features (i.e., Formant Frequencies, Spectral features, etc.) are helpful for speech emotion recognition.
…”