Incorporating Interpersonal Synchronization Features for Automatic Emotion Recognition from Visual and Audio Data during Communication

Quan, Jingyu; Miyake, Yoichi; Nozawa, Takayuki

doi:10.3390/s21165317

Cited by 12 publications

(14 citation statements)

References 79 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Section: Comparative Analysis Resultsmentioning

confidence: 98%

“…For the case of K-EmoCon comparative analysis (Table 2), the AffECt model outperformed the state-of-the-art ones [50], [53] by a considerable margin, demonstrating its effectiveness in EC arousal and valence classification. The highest accuracy achieved by the AffECt model on EC arousal [50], [53] (max of 75.1% (arousal) and 68.3% (valence) using CNN-Bi-LSTM [50]). Additionally, the highest sensitivity achieved by the AffECt model on EC arousal and valence classification using KNN is 91.9% and 92.5%, respectively, surpassing the highest sensitivity achieved by the other models [50], [53] (max of 79.2% (arousal) and 72.3% (valence) using CNN-BiLSTM [50]).…”

Section: Comparative Analysis Resultsmentioning

confidence: 98%

See 1 more Smart Citation

Novel Speech-Based Emotion Climate Recognition in Peers’ Conversations Incorporating Affect Dynamics and Temporal Convolutional Neural Networks

Alhussein¹,

Alkhodari²,

Khandoker³

et al. 2023

Preprint

View full text Add to dashboard Cite

<p> Peers’ conversation provides a domain of rich emotional information. The latter, apart from facial and gestural expressions, it is also naturally conveyed via peers’ speech, contributing to the establishment of a dynamic emotion climate (EC) during their conversational interaction. Recognition of EC could provide an additional source in understating peers’ social interaction and behavior on top of peers’ actual conversational content. Here, we propose a novel approach for speech-based EC recognition, namely AffECt, by combining peers’ complex affect dynamics (AD) with deep features extracted from speech signals using Temporary Convolutional Neural Networks (TCNNs). AffECt was tested and cross-validated on data drawn from there open datasets, i.e., K-EmoCon, IEMOCAP, and SEWA, in terms of EC arousal/valence level classification. The experimental results have shown that AffECt achieves EC classification accuracy up to 83.3% and 80.2% for arousal and valence, respectively, clearly surpassing the results reported in the literature, exhibiting robust performance across different languages. Moreover, there is a distinct improvement when the AD are combined with the TCNN, compared to the baseline deep learning approaches. These results demonstrate the effectiveness of AffECt in speech-based EC recognition, paving the way for many applications, e.g., in patients’ group therapy, negotiations, and emotion-aware mobile applications </p>

show abstract

Section: Comparative Analysis Resultsmentioning

confidence: 98%

Section: Comparative Analysis Resultsmentioning

confidence: 98%

Novel Speech-Based Emotion Climate Recognition in Peers’ Conversations Incorporating Affect Dynamics and Temporal Convolutional Neural Networks

Alhussein¹,

Alkhodari²,

Khandoker³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…To avoid data leakage, we did not use data re-sampling approaches [24] to solve the imbalanced data problem as they will change the original dataset itself. Instead, we chose focal loss [16] as the loss function of our model, which can automatically downweight the contribution of easily classified samples and focus on hard misclassified samples by applying a modulating term to the cross-entropy loss.…”

Section: Implementation Detailsmentioning

confidence: 99%

“…Considering that the classes of arousal and valence are heavily imbalanced, we did not use recognition accuracy as the evaluation metric like most studies. Instead, following [17,24], we chose the average F1 score (Macro-F1) and unweighted average recall (UAR) as our validation metrics. These metrics give the same importance to each class, and are defined as the mean of class-wise F1 scores and recall scores respectively.…”

Section: Evaluation Metricmentioning

confidence: 99%

Mobile Emotion Recognition via Multiple Physiological Signals using Convolution-augmented Transformer

Yang

Tag

et al. 2022

Proceedings of the 2022 International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

Recognising and monitoring emotional states play a crucial role in mental health and well-being management. Importantly, with the widespread adoption of smart mobile and wearable devices, it has become easier to collect long-term and granular emotion-related physiological data passively, continuously, and remotely. This creates new opportunities to help individuals manage their emotions and well-being in a less intrusive manner using off-the-shelf low-cost devices. Pervasive emotion recognition based on physiological signals is, however, still challenging due to the difficulty to efficiently extract high-order correlations between physiological signals and users' emotional states. In this paper, we propose a novel end-to-end emotion recognition system based on a convolution-augmented transformer architecture. Specifically, it can recognise users' emotions on the dimensions of arousal and valence by learning both the global and local fine-grained associations and dependencies within and across multimodal physiological data (including blood volume pulse, electrodermal activity, heart rate, and skin temperature). We extensively evaluated the performance of our model using the K-EmoCon dataset, which is acquired in naturalistic conversations using off-the-shelf devices and contains spontaneous emotion data. Our results demonstrate that our approach outperforms the baselines and achieves state-of-the-art or competitive performance. We also demonstrate the effectiveness and generalizability of our system on another affective dataset which used affect inducement and commercial physiological sensors.

show abstract

“…The study presented in [ 16 ] by J. Quan, Y. Miyake, and T. Nozawa, investigates automatic emotion recognition using visual, audio, and audio-visual features. The authors built two types of emotion recognition models: an individual model, and interpersonal model, capturing interpersonal interaction activities, both verbal and non-verbal.…”

Section: Overview Of the Contributionsmentioning

confidence: 99%

Analytics and Applications of Audio and Image Sensing Techniques

Wieczorkowska

2022

Sensors

View full text Add to dashboard Cite

show abstract

Incorporating Interpersonal Synchronization Features for Automatic Emotion Recognition from Visual and Audio Data during Communication

Cited by 12 publications

References 79 publications

Novel Speech-Based Emotion Climate Recognition in Peers’ Conversations Incorporating Affect Dynamics and Temporal Convolutional Neural Networks

Novel Speech-Based Emotion Climate Recognition in Peers’ Conversations Incorporating Affect Dynamics and Temporal Convolutional Neural Networks

Mobile Emotion Recognition via Multiple Physiological Signals using Convolution-augmented Transformer

Analytics and Applications of Audio and Image Sensing Techniques

Contact Info

Product

Resources

About