Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques

Tuncer, Türker; Doğan, Şengül; Acharya, U. Rajendra

doi:10.1016/j.knosys.2020.106547

Cited by 102 publications

(35 citation statements)

References 76 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A classifier is chosen as the error/loss value calculator, the feature vector with the minimum error value is chosen as the optimum feature vector. INCA selects a variant number of features for variant problems [ 41 ]. In this study, we selected the feature range from 50 to 1000 .…”

Section: Methodsmentioning

confidence: 99%

Multi-classification deep CNN model for diagnosing COVID-19 using iterative neighborhood component analysis and iterative ReliefF feature selection techniques with X-ray images

Aslan

Koca

Kobat

et al. 2022

Chemometrics and Intelligent Laboratory Systems

Self Cite

View full text Add to dashboard Cite

Section: Methodsmentioning

confidence: 99%

Multi-classification deep CNN model for diagnosing COVID-19 using iterative neighborhood component analysis and iterative ReliefF feature selection techniques with X-ray images

Aslan

Koca

Kobat

et al. 2022

Chemometrics and Intelligent Laboratory Systems

Self Cite

View full text Add to dashboard Cite

“…Nowadays, an optimal feature selection and a better machine or classifier are challenging tasks for a robust SER system 32 . For optimal features, the researchers have utilized advanced techniques and deep learning approaches for an emotional speech feature representations due to enormous achievements in different fields for recognition tasks 12 . Hence, the researchers have been inspired by the performances of the deep learning approaches, so they have developed various techniques for the SER and have increased the level of accuracy 33 .…”

Section: Literature Reviewmentioning

confidence: 99%

“…32 For optimal features, the researchers have utilized advanced techniques and deep learning approaches for an emotional speech feature representations due to enormous achievements in different fields for recognition tasks. 12 Hence, the researchers have been inspired by the performances of the deep learning approaches, so they have developed various techniques for the SER and have increased the level of accuracy. 33 Similarly, 34 developed a method to recognize the emotions from a speech spectrogram using the CNN model.…”

Section: Literature Reviewmentioning

confidence: 99%

See 1 more Smart Citation

Optimal feature selection based speech emotion recognition using two‐stream deep convolutional neural network

Mustaqeem

Kwon

2021

Int J of Intelligent Sys

View full text Add to dashboard Cite

Speech signal processing is an active area of research, the most dominant source of exchanging information among human beings, and the best way for human–computer interaction (HCI). Human behavior assessments and emotion recognition from a speech signal, such as speech emotion recognition (SER) is an emerging HCI area of exploration with various real time claims. The performance of an efficient SER system depends on feature learning, which include salient and discriminative information such as high‐level deep features. In this paper, we proposed a two‐stream deep convolutional neural network with an iterative neighborhood component analysis (INCA) to learn mutually spatial‐spectral features and select the most discriminative optimal features for the final prediction. Our model is composed of two channels, and each channel is associated with the convolutional neural network structure to extract cues from the oral signals. The first channel extracts feature from the spectral domain, and the second channel extracts features from the spatial domain, which are then fused and fed to the INCA to remove the severance and select the optimal features for the final model training. The joint refine features are passed from the fully connected network with a softmax classifier to yield the predictions of the different emotions. We trained our proposed system using three benchmarks, which included the EMO‐DB, SAVEE, and RAVDESS emotional speech corpora, and we tested the prediction performance to secure 95%, 82%, and 85% recognition rates. The performance of the system shows the effectiveness and significance of the proposed system.

show abstract

“…Compared with traditional features [13], log-mel spectrograms try to match human hearing by preserving both the frequency domain as well as the time domain information [14]. It is worth noting that with the introduction of the Attention Mechanism structure [15], which includes various variants such as the Transformer, good results have been achieved in both classification and dimensionality tasks of SER [16].…”

Section: Introductionmentioning

confidence: 99%

A Residual Multi-Scale Convolutional Transformer Network with Chunk-level Log-Mel Spectrograms for Speech Emotion Recognition

Yan¹,

Wang²,

Parada-Cabaleiro³

et al. 2022

Preprint

View full text Add to dashboard Cite

<p>The great variety of human emotional expression as well as the differences in the ways they perceive and annotate them make Speech Emotion Recognition (SER) an ambiguous and challenging task. With the development of deep learning, long-term progress has been made in supervised SER systems. However, the existing convolutional neural networks present certain limitations, such as their inability to well capture global features, which contain important emotional information. In addition, due to the subjective nature and continuity of emotion, the instance segments in which emotional speech is typically segmented do not fully reflect the true labels and cannot describe dynamic temporal changes. Thus, accurate emotional representation cannot be learnt in the process of feature extraction. In order to overtake these limitations, we propose an end-to-end network only for speech that maps sequences of different lengths to a fixed number of chunks and strictly preserves the order of chunks by adaptively adjusting their overlap. Subsequently, it extracts log-mel spectrogram features from chunk-level segments and feeds them into the Residual Multi-Scale Convolutional Neutral Networks with Transformer(RMSCTx) model framework. Finally, by keeping the order of the chunk-level segments, a temporal domain mean layer is used to further extract utterance-level feature representations. With this method, we perform multidimensional SER, i. e., the prediction of arousal, valence, and dominance. The experimental results on three popular corpora demonstrate not only the superiority of our approach, but also the robustness of the model for SER, showing an improvement of the recognition accuracy in the newest version of the public dataset MSP-Podcast (1.9).</p>

show abstract

Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques

Cited by 102 publications

References 76 publications

Multi-classification deep CNN model for diagnosing COVID-19 using iterative neighborhood component analysis and iterative ReliefF feature selection techniques with X-ray images

Multi-classification deep CNN model for diagnosing COVID-19 using iterative neighborhood component analysis and iterative ReliefF feature selection techniques with X-ray images

Optimal feature selection based speech emotion recognition using two‐stream deep convolutional neural network

A Residual Multi-Scale Convolutional Transformer Network with Chunk-level Log-Mel Spectrograms for Speech Emotion Recognition

Contact Info

Product

Resources

About