Jundong Shen scite author profile

Multi-modal Multi-label Emotion Recognition (MMER) aims to identify various human emotions from heterogeneous visual, audio and text modalities. Previous methods mainly focus on projecting multiple modalities into a common latent space and learning an identical representation for all labels, which neglects the diversity of each modality and fails to capture richer semantic information for each label from different perspectives. Besides, associated relationships of modalities and labels have not been fully exploited. In this paper, we propose versaTile multi-modAl learning for multI-labeL emOtion Recognition (TAILOR), aiming to refine multi-modal representations and enhance discriminative capacity of each label. Specifically, we design an adversarial multi-modal refinement module to sufficiently explore the commonality among different modalities and strengthen the diversity of each modality. To further exploit label-modal dependence, we devise a BERT-like cross-modal encoder to gradually fuse private and common modality representations in a granularity descent way, as well as a label-guided decoder to adaptively generate a tailored representation for each label with the guidance of label semantics. In addition, we conduct experiments on the benchmark MMER dataset CMU-MOSEI in both aligned and unaligned settings, which demonstrate the superiority of TAILOR over the state-of-the-arts.

show abstract

Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition

Zhang¹,

Chen²,

Shen³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

Multi-view Multi-label Learning with Dual-Attention Networks for Stroke Screen

Shen

Zhang

Cheng

et al. 2020

View full text Add to dashboard Cite

Partial Modal Conditioned GANs for Multi-modal Multi-label Learning with Arbitrary Modal-Missing

Zhang

Shen

Zhang

et al. 2021

View full text Add to dashboard Cite

Online streaming is an emerging market that address much attention. Assessing gaming skills from videos is an important task for streaming service providers to discover talented gamers. Service providers require the information to offer customized recommendation and service promotion to their customers. Meanwhile, this is also an important multi-modal machine learning tasks since online streaming combines vision, audio and text modalities. In this study we begin by identifying flaws in the dataset and proceed to clean it manually. Then we propose several variants of latest end-to-end models to learn joint representation of multiple modalities. Through our extensive experimentation, we demonstrate the efficacy of our proposals. Moreover, we identify that our proposed models is prone to identifying users instead of learning meaningful representations. We purpose future work to address the issue in the end.

show abstract

Rethinking Modal-oriented Label Correlations for Multi-modal Multi-label Learning

Zhang

Shen

Zhang

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jundong Shen

Tailor Versatile Multi-Modal Learning for Multi-Label Emotion Recognition

Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition

Multi-view Multi-label Learning with Dual-Attention Networks for Stroke Screen

Partial Modal Conditioned GANs for Multi-modal Multi-label Learning with Arbitrary Modal-Missing

Rethinking Modal-oriented Label Correlations for Multi-modal Multi-label Learning

Contact Info

Product

Resources

About