Multimodal emotion recognition based on audio and text by using hybrid attention networks

Zhang, Shiqing; Yang, Yijiao; Chen, Chen; Liu, Ruixin; Xin, Tao; Guo, Wenping; Xu, Yicheng; Zhao, Xiaoming

doi:10.1016/j.bspc.2023.105052

Cited by 25 publications

(1 citation statement)

References 65 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The transformer based networks (Dosovitskiy et al, 2020) utilized the self-attention mechanism to build long-term relationships of dependency and could obtain competitive results in image recognition. It was noted that the transformer-based model [29][30][31][32] had mainly focused on improving the ability to extract the global context information and ignored the detailed information. MLP-Mixer [33] showed that pure MLP-based networks could achieve competitive performance in image segmentation since MLP can replace the self-attention mechanism in some extent.…”

Section: Introductionmentioning

confidence: 99%

AMS-MLP: Adaptive Multi-Scale MLP Network with Multi-Scale Context Relation Decoder for Pepper Leaf Segmentation

Fang,

Jiang,

Liu

et al. 2024

Preprint

View full text Add to dashboard Cite

Pepper leaf segmentation plays a crucial role in monitoring pepper leaf diseases in various backgrounds and ensuring the healthy growth of peppers. However, existing transformer-based segmentation methods suffer from computational inefficiency, excessive parameterization, and limited utilization of edge information. To tackle these challenges, we propose an adaptive multi-scale MLP framework, named AMS-MLP, which combines the multi-path aggregation module (MPAM) and the multi-scale context relation mask module (MCRD) to refine the object boundaries in pepper leaf segmentation. AMS-MLP consists of an encoder-based network, an adaptive multi-scale MLP (AM-MLP) module, and a decoder network. In the encoder network, the MPAM module effectively fuses five-scale features to generate a single-channel mask, improving the accuracy of pepper leaf boundary extraction. The AM-MLP module divides the input features into two branches: the global multi-scale MLP branch captures long-range dependencies between image information, while the local multi-scale MLP branch focuses on extracting local feature maps. Adaptive attention mechanism is designed to dynamically adjust the weights of global and local features. The decoder network incorporates the MCRD module into the convolutional layer, enhancing the extraction of boundary features. To verify the performance of the proposed method, we conducted extensive experiments on three pepper leaf datasets with different backgrounds. The results demonstrate mIoU scores of 97.39%, 96.91%, and 97.91%, as well as F1 scores of 98.29%, 97.86%, and 98.51%, respectively. Comparative analysis with U-Net and state-of-the-art models reveals that the proposed method dramatically improves the accuracy and efficiency of pepper leaf image segmentation.

show abstract

Section: Introductionmentioning

confidence: 99%

AMS-MLP: Adaptive Multi-Scale MLP Network with Multi-Scale Context Relation Decoder for Pepper Leaf Segmentation

Fang,

Jiang,

Liu

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

Multimodal emotion recognition: A comprehensive review, trends, and challenges

Ramaswamy,

Palaniswamy

2024

WIREs Data Min & Knowl

View full text Add to dashboard Cite

Automatic emotion recognition is a burgeoning field of research and has its roots in psychology and cognitive science. This article comprehensively reviews multimodal emotion recognition, covering various aspects such as emotion theories, discrete and dimensional models, emotional response systems, datasets, and current trends. This article reviewed 179 multimodal emotion recognition literature papers from 2017 to 2023 to reflect on the current trends in multimodal affective computing. This article covers various modalities used in emotion recognition based on the emotional response system under four categories: subjective experience comprising text and self‐report; peripheral physiology comprising electrodermal, cardiovascular, facial muscle, and respiration activity; central physiology comprising EEG, neuroimaging, and EOG; behavior comprising facial, vocal, whole‐body behavior, and observer ratings. This review summarizes the measures and behavior of each modality under various emotional states. This article provides an extensive list of multimodal datasets and their unique characteristics. The recent advances in multimodal emotion recognition are grouped based on the research focus areas such as emotion elicitation strategy, data collection and handling, the impact of culture and modality on multimodal emotion recognition systems, feature extraction, feature selection, alignment of signals across the modalities, and fusion strategies. The recent multimodal fusion strategies are detailed in this article, as extracting shared representations of different modalities, removing redundant features from different modalities, and learning critical features from each modality are crucial for multimodal emotion recognition. This article summarizes the strengths and weaknesses of multimodal emotion recognition based on the review outcome, along with challenges and future work in multimodal emotion recognition. This article aims to serve as a lucid introduction, covering all aspects of multimodal emotion recognition for novices.This article is categorized under: Fundamental Concepts of Data and Knowledge > Human Centricity and User Interaction Technologies > Cognitive Computing Technologies > Artificial Intelligence

show abstract

SADCL-Net: Sparse-driven Attention with Dual-Consistency Learning Network for Incomplete Multi-view Clustering

Xue,

Zhu

2024

Multimedia Systems

View full text Add to dashboard Cite

Multimodal emotion recognition based on audio and text by using hybrid attention networks

Cited by 25 publications

References 65 publications

AMS-MLP: Adaptive Multi-Scale MLP Network with Multi-Scale Context Relation Decoder for Pepper Leaf Segmentation

AMS-MLP: Adaptive Multi-Scale MLP Network with Multi-Scale Context Relation Decoder for Pepper Leaf Segmentation

Multimodal emotion recognition: A comprehensive review, trends, and challenges

SADCL-Net: Sparse-driven Attention with Dual-Consistency Learning Network for Incomplete Multi-view Clustering

Contact Info

Product

Resources

About