Emotion recognition is a topic of significant interest in assistive robotics due to the need to equip robots with the ability to comprehend human behavior, facilitating their effective interaction in our society. Consequently, efficient and dependable emotion recognition systems supporting optimal human-machine communication are required. Multi-modality (including speech, audio, text, images, and videos) is typically exploited in emotion recognition tasks. Much relevant research is based on merging multiple data modalities and training deep learning models utilizing low-level data representations. However, most existing emotion databases are not large (or complex) enough to allow machine learning approaches to learn detailed representations. This paper explores modalityspecific pre-trained transformer frameworks for self-supervised learning of speech and text representations for data-efficient emotion recognition while achieving state-of-the-art performance in recognizing emotions. This model applies feature-level fusion using nonverbal cue data points from motion capture to provide multimodal speech emotion recognition. The model was trained using the publicly available IEMOCAP dataset, achieving an overall accuracy of 77.58% for four emotions, outperforming state-of-the-art approaches
In recent years, deep learning has been applied to many medical imaging fields, including medical image processing, bioinformatics, medical image classification, segmentation, and prediction tasks. Computer-aided detection systems have been widely adopted in brain tumor classification, prediction, detection, diagnosis, and segmentation tasks. This work proposes a novel model that combines the Bayesian algorithm with depth-wise separable convolutions for accurate classification and predictions of brain tumors. We combine Bayesian modeling learning and Convolutional Neural Network learning methods for accurate prediction results to provide the radiologists the means to classify the Magnetic Resonance Imaging (MRI) images rapidly. After thorough experimental analysis, our proposed model outperforms other state-of-the-art models in terms of validation accuracy, training accuracy, F1-score, recall, and precision. Our model obtained high performances of 99.03% training accuracy and 94.32% validation accuracy, F1-score, precision, and recall values of 0.94, 0.95, and 0.94, respectively. To the best of our knowledge, the proposed work is the first neural network model that combines the hybrid effect of depth-wise separable convolutions with the Bayesian algorithm using encoders.
The COVID-19 pandemic has had a significant impact on many lives and the economies of many countries since late December 2019. Early detection with high accuracy is essential to help break the chain of transmission. Several radiological methodologies, such as CT scan and chest X-ray, have been employed in diagnosing and monitoring COVID-19 disease. Still, these methodologies are time-consuming and require trial and error. Machine learning techniques are currently being applied by several studies to deal with COVID-19. This study exploits the latent embeddings of variational autoencoders combined with ensemble techniques to propose three effective EVAE-Net models to detect COVID-19 disease. Two encoders are trained on chest X-ray images to generate two feature maps. The feature maps are concatenated and passed to either a combined or individual reparameterization phase to generate latent embeddings by sampling from a distribution. The latent embeddings are concatenated and passed to a classification head for classification. The COVID-19 Radiography Dataset from Kaggle is the source of chest X-ray images. The performances of the three models are evaluated. The proposed model shows satisfactory performance, with the best model achieving 99.19% and 98.66% accuracy on four classes and three classes, respectively.
Natural disasters, such as floods, can cause significant damage to both the environment and human life. Rapid and accurate identification of affected areas is crucial for effective disaster response and recovery efforts. In this paper, we aimed to evaluate the performance of state-of-the-art (SOTA) computer vision models for flood image classification, by utilizing a semi-supervised learning approach on a dataset named FloodNet. To achieve this, we trained son 11 state-of-the-art (SOTA) models and modified them to suit the classification task at hand. Furthermore, we also introduced a technique of varying the uncertainty offset λ in the models to analyze its impact on the performance. The models were evaluated using standard classification metrics such as Loss, Accuracy, F1 Score, Precision, Recall, and ROC-AUC. The results of this study provide a quantitative comparison of the performance of different CNN architectures for flood image classification, as well as the impact of different uncertainty offset λ. These findings can aid in the development of more accurate and efficient disaster response and recovery systems, which could help in minimizing the impact of natural disasters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.