Research on Speech Emotion Recognition Based on Teager Energy Operator Coefficients and Inverted MFCC Feature Fusion

Wang, Feifan; Shen, Xizhong

doi:10.3390/electronics12173599

Electronics

2023

DOI: 10.3390/electronics12173599

|View full text |Cite

Research on Speech Emotion Recognition Based on Teager Energy Operator Coefficients and Inverted MFCC Feature Fusion

Feifan Wang,

Xizhong Shen

Abstract: As an important part of our daily life, speech has a great impact on the way people communicate. The Mel filter bank used in the extraction process of MFCC has a better ability to process the low-frequency component of a speech signal, but it weakens the emotional information contained in the high-frequency part of the speech signal. We used the inverted Mel filter bank to enhance the feature processing of the high-frequency part of the speech signal to obtain the IMFCC coefficients and fuse the MFCC features … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

2025

Publication Types

Select...

Article3

Relationship

Self Cite0

Independent3

Authors

Journals

Cited by 3 publications

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

A robust accent classification system based on variational mode decomposition

Subhash,

G.,

et al. 2025

Engineering Applications of Artificial Intelligence

View full text Add to dashboard Cite

A robust accent classification system based on variational mode decomposition

Subhash,

G.,

et al. 2025

Engineering Applications of Artificial Intelligence

View full text Add to dashboard Cite

Research on sound quality of roller chain transmission system based on multi-source transfer learning

Li,

An,

Cheng

et al. 2024

Sci Rep

View full text Add to dashboard Cite

To establish the sound quality evaluation model of roller chain transmission system, we collect the running noise under different working conditions. After the noise samples are preprocessed, a group of experienced testers are organized to evaluate them subjectively. Mel frequency cepstral coefficient (MFCC) of each noise sample is calculated, and the MFCC feature map is used as an objective evaluation. Combining with the subjective and objective evaluation results of the roller chain system noise, we can get the original dataset of its sound quality research. However, the number of high-quality noise samples is relatively small. Based on the sound quality research of various chain transmission systems, a novel method called multi-source transfer learning convolutional neural network (MSTL-CNN) is proposed. By transferring knowledge from multiple source tasks to target task, the difficulty of small sample sound quality prediction is solved. Compared with the problem that single source task transfer learning has too much error on some samples, MSTL-CNN can give full play to the advantages of all transfer learning models. The results also show that the MSTL-CNN proposed in this paper is significantly better than the traditional sound quality evaluation methods.

show abstract

Self-Labeling Learning Ensemble via Deep Recurrent Neural Network and Self-Representation for Speech Emotion Recognition

Cui,

Jiang,

Dai

2024

Int. J. Patt. Recogn. Artif. Intell.

View full text Add to dashboard Cite

Speech emotion recognition (SER) methods rely on frames to analyze the speech data. However, the existing methods typically divide a speech sample into smaller speech frames and label them with a single emotional tag, which fails to consider the possibility of multiple emotion tags coexisting within a speech sample. To deal with this limitation, we present a novel approach called self-labeling learning ensemble via DRNN and self-representation (En-DRNN-SR) for SER. This method automatically segments speech sample into speech frames, and then the deep recurrent neural network (DRNN) is applied to learn the deep features, and next the self-representation is built to get a relational degree matrix, finally the speech frames is divided into three parts using a relational degree matrix: the key emotional frames, the compatible emotional frames and the noise frames. The emotion tags of the compatible emotional frames are adaptive cyclic learned based on the key emotion frames vias the relational degree matrix, while also checking the emotion tags associated with the key compatible frames. Additionally, we introduce a new self-labeling criterion based on fuzzy membership degree for SER. To evaluate the feasibility and effectiveness of the proposed En-DRNN-SR, we conducted extensive experiments on IEMOCAP, EMODB, and SAVEE database, the proposed En-DRNN-SR obtains 69.13%, 82.83%, and 52.31% results on IEMOCAP, EMODB, and SAVEE database, which outperformed all competing algorithms. The experimental results clearly demonstrate that the proposed approach outperforms state-of-the-art SER methods, achieving superior performance on feature learning and classification.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Research on Speech Emotion Recognition Based on Teager Energy Operator Coefficients and Inverted MFCC Feature Fusion

Cited by 3 publications

References 44 publications

A robust accent classification system based on variational mode decomposition

A robust accent classification system based on variational mode decomposition

Research on sound quality of roller chain transmission system based on multi-source transfer learning

Self-Labeling Learning Ensemble via Deep Recurrent Neural Network and Self-Representation for Speech Emotion Recognition

Contact Info

Product

Resources

About