Speech emotion recognition (SER) plays a significant role in human–machine interaction. Emotion recognition from speech and its precise classification is a challenging task because a machine is unable to understand its context. For an accurate emotion classification, emotionally relevant features must be extracted from the speech data. Traditionally, handcrafted features were used for emotional classification from speech signals; however, they are not efficient enough to accurately depict the emotional states of the speaker. In this study, the benefits of a deep convolutional neural network (DCNN) for SER are explored. For this purpose, a pretrained network is used to extract features from state-of-the-art speech emotional datasets. Subsequently, a correlation-based feature selection technique is applied to the extracted features to select the most appropriate and discriminative features for SER. For the classification of emotions, we utilize support vector machines, random forests, the k-nearest neighbors algorithm, and neural network classifiers. Experiments are performed for speaker-dependent and speaker-independent SER using four publicly available datasets: the Berlin Dataset of Emotional Speech (Emo-DB), Surrey Audio Visual Expressed Emotion (SAVEE), Interactive Emotional Dyadic Motion Capture (IEMOCAP), and the Ryerson Audio Visual Dataset of Emotional Speech and Song (RAVDESS). Our proposed method achieves an accuracy of 95.10% for Emo-DB, 82.10% for SAVEE, 83.80% for IEMOCAP, and 81.30% for RAVDESS, for speaker-dependent SER experiments. Moreover, our method yields the best results for speaker-independent SER with existing handcrafted features-based SER approaches.
The advent of new devices, technology, machine learning techniques, and the availability of free large speech corpora results in rapid and accurate speech recognition. In the last two decades, extensive research has been initiated by researchers and different organizations to experiment with new techniques and their applications in speech processing systems. There are several speech command based applications in the area of robotics, IoT, ubiquitous computing, and different human-computer interfaces. Various researchers have worked on enhancing the efficiency of speech command based systems and used the speech command dataset. However, none of them catered to noise in the same. Noise is one of the major challenges in any speech recognition system, as real-time noise is a very versatile and unavoidable factor that affects the performance of speech recognition systems, particularly those that have not learned the noise efficiently. We thoroughly analyse the latest trends in speech recognition and evaluate the speech command dataset on different machine learning based and deep learning based techniques. A novel technique is proposed for noise robustness by augmenting noise in training data. Our proposed technique is tested on clean and noisy data along with locally generated data and achieves much better results than existing state-of-the-art techniques, thus setting a new benchmark.
Computer-Aided Language Learning (CALL) is growing nowadays because learning new languages is essential for communication with people of different linguistic backgrounds. Mispronunciation detection is an integral part of CALL, which is used for automatic pointing of errors for the non-native speaker. In this paper, we investigated the mispronunciation detection of Arabic words using deep Convolution Neural Network (CNN). For automated pronunciation error detection, we proposed CNN features-based model and extracted features from different layers of Alex Net (layers 6, 7, and 8) to train three machine learning classifiers; K-nearest neighbor (KNN), Support Vector Machine (SVM) and Random Forest (RF). We also used a transfer learning-based model in which feature extraction and classification are performed automatically. To evaluate the performance of the proposed method, a comprehensive evaluation is provided on these methods with a traditional machine learning-based method using Mel Frequency Cepstral Coefficients (MFCC) features. We used the same three classifiers KNN, SVM, and RF in the baseline method for mispronunciation detection. Experimental results show that with handcrafted features, transfer learning-based method and classification based on deep features extracted from Alex Net achieved an average accuracy of 73.67, 85 and 93.20 on Arabic words, respectively. Moreover, these results reveal that the proposed method with feature selection achieved the best average accuracy of 93.20% than all other methods.
Safety-critical, distributed real-time systems, such as avionics, automotive or factory automation and control systems, and the like, require efficient communication mechanism between their nodes in order to deliver information within defined time frames. Protocols that follow the time-triggered architecture paradigm guarantee timeliness under a given fault and load hypothesis through the use of a stringent, equivalently-spaced time division multiple access (TDMA) scheme. This, however, comes at the cost of poor channel and bandwidth utilisation in real-world scenarios where functionality and transmission requirements often differ considerably between nodes. Here, we propose a new approach and protocol, INCUS, that allows the slot length of nodes to be configured in accordance with their payload requirements. We show the feasibility of our approach while retaining the level of reliability required for safety-critical realtime systems. Our analysis shows an almost twofold improvement in efficiency in a typical automotive, brake by wire scenario.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.