Humans express and perceive emotions in a multimodal manner. The multimodal information is intrinsically fused by the human sensory system in a complex manner. Emulating a temporal desynchronisation between modalities, in this paper, we design an end-to-end neural network architecture, called TA-AVN, that aggregates temporal audio and video information in an asynchronous setting in order to determine the emotional state of a subject. The feature descriptors for audio and video representations are extracted using simple Convolutional Neural Networks (CNNs), leading to real-time processing. Undoubtedly, collecting annotated training data remains an important challenge when training emotion recognition systems, both in terms of effort and expertise required. The proposed approach solves this problem by providing a natural augmentation technique that allows achieving a high accuracy rate even when the amount of annotated training data is limited. The framework is tested on three challenging multimodal reference datasets for the emotion recognition task, namely the benchmark datasets CREMA-D and RAVDESS, and one dataset from the FG2020's challenge related to emotion recognition. The results prove the effectiveness of our approach and our end-to-end framework achieves state-of-the-art results on the CREMA-D and RAVDESS datasets.INDEX TERMS Emotion recognition, multimodal data, audiovisual information, augmentation techniques, convolutional neural network, real-time processing.
Chest computed tomography (CT) has played a valuable, distinct role in the screening, diagnosis, and follow-up of COVID-19 patients. The quantification of COVID-19 pneumonia on CT has proven to be an important predictor of the treatment course and outcome of the patient although it remains heavily reliant on the radiologist's subjective perceptions. Here, we show that with the adoption of CT for COVID-19 management, a new type of psychophysical bias has emerged in radiology. A preliminary survey of 40 radiologists and a retrospective analysis of CT data from 109 patients from two hospitals revealed that radiologists overestimated the percentage of lung involvement by 10.23 ± 4.65% and 15.8 ± 6.6%, respectively. In the subsequent randomised controlled trial, artificial intelligence (AI) decision support reduced the absolute overestimation error (P < 0.001) from 9.5% ± 6.6 (No-AI analysis arm, n = 38) to 1.0% ± 5.2 (AI analysis arm, n = 38). These results indicate a human perception bias in radiology that has clinically meaningful effects on the quantitative analysis of COVID-19 on CT. The objectivity of AI was shown to be a valuable complement in mitigating the radiologist’s subjectivity, reducing the overestimation tenfold.Trial registration: https://Clinicaltrial.gov. Identifier: NCT05282056, Date of registration: 01/02/2022.
Emotion recognition has a pivotal role in affective computing and in human-computer interaction. The current technological developments lead to increased possibilities of collecting data about the emotional state of a person. In general, human perception regarding the emotion transmitted by a subject is based on vocal and visual information collected in the first seconds of interaction with the subject. As a consequence, the integration of verbal (i.e., speech) and non-verbal (i.e., image) information seems to be the preferred choice in most of the current approaches towards emotion recognition. In this paper, we propose a multimodal fusion technique for emotion recognition based on combining audio-visual modalities from a temporal window with different temporal offsets for each modality. We show that our proposed method outperforms other methods from the literature and human accuracy rating. The experiments are conducted over the open-access multimodal dataset CREMA-D.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.