Human speech is not only a verbose medium of communication but it also conveys emotions. The past decade has seen a lot of research going on with speech data which becomes especially important for human-computer interaction and also healthcare, security and entertainment. This paper proposes the TLEFuzzyNet model, a three-stage pipeline for emotion recognition from speech. The first stage includes feature extraction by data augmentation of speech signals and extraction of Mel spectrograms, followed by the use three pre-trained transfer learning CNN models namely, ResNet18, Inception_v3 and GoogleNet whose prediction scores are fed to the third stage. In the final stage, we assign Fuzzy Ranks using a modified Gompertz function which gives the final prediction scores after considering the individual scores from the three CNN models. We have used the Surrey Audio-Visual Expressed Emotion (SAVEE), the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and the Berlin Database of Emotional Speech (EmoDB) datasets to evaluate the TLEFuzzyNet model which has achieved state-of-theart performance and is hence a dependable framework for Speech emotion recognition(SER). All the codes are available using GitHub link: https://github.com/KaramSahoo/SpeechEmotionRecognitionFuzzy
The Human Activity Recognition (HAR) problem leverages pattern recognition to classify physical human activities as they are captured by several sensor modalities. Remote monitoring of an individual’s activities has gained importance due to the reduction in travel and physical activities during the pandemic. Research on HAR enables one person to either remotely monitor or recognize another person’s activity via the ubiquitous mobile device or by using sensor-based Internet of Things (IoT). Our proposed work focuses on the accurate classification of daily human activities from both accelerometer and gyroscope sensor data after converting into spectrogram images. The feature extraction process follows by leveraging the pre-trained weights of two popular and efficient transfer learning convolutional neural network models. Finally, a wrapper-based feature selection method has been employed for selecting the optimal feature subset that both reduces the training time and improves the final classification performance. The proposed HAR model has been tested on the three benchmark datasets namely, HARTH, KU-HAR and HuGaDB and has achieved 88.89%, 97.97% and 93.82% respectively on these datasets. It is to be noted that the proposed HAR model achieves an improvement of about 21%, 20% and 6% in the overall classification accuracies while utilizing only 52%, 45% and 60% of the original feature set for HuGaDB, KU-HAR and HARTH datasets respectively. This proves the effectiveness of our proposed wrapper-based feature selection HAR methodology.
Music has been an integral part of the history of humankind with theories suggesting it is more antediluvian than speech itself. Music is an ordered succession of tones and harmonies that produce sounds characterised by melody and rhythm. Our paper proposes an ensemble deep learning musical instrument classification (MIC) framework, named as MIC_FuzzyNET model which aims to classify the dominant instruments present in musical clips. Firstly, the musical data is converted to three different spectrograms: Constant Q-Transform, Semitone Spectrogram and Mel Spectrogram, which is then stacked to form 3 channel 2D data. This stacked spectrogram is fed to transfer learning models namely, EfficientNetV2 and ResNet18 which output the preliminary classification scores. A fuzzy rank ensemble model is finally employed that assigns the classifier ranks, on the testing data in order to achieve final enhanced classification scores which reduces error and biases for the constituent CNN architectures. Our proposed framework has been evaluated on the Persian Classical Music Instrument Recognition (PCMIR) dataset and Instrument Recognition in Musical Audio Signals (IRMAS) dataset. It has achieved considerably high accuracy, making our proposed framework a robust MIC model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.