MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations

Poria, Soujanya; Hazarika, Devamanyu; Majumder, Navonil; Naik, Gautam; Wang, Zhaoxia; Mihalcea, Rada

doi:10.48550/arxiv.1810.02508

Cited by 76 publications

(101 citation statements)

References 0 publications

Supporting

Mentioning

101

Contrasting

Order By: Relevance

“…Each dataset consists of reviews rated on a scale of 1 (strong negative) to 5 (strong positive). Similarly, for ERC, we collect three widely used datasets: DyDa: DailyDialog (Li et al, 2017), IEMOCAP: interactive emotional dyadic motion capture database (Busso et al, 2008), and MELD: Multimodal EmotionLines Dataset (Poria et al, 2018). To demonstrate our methodology, we partition the DyDa dataset into four equal chunks.…”

Section: Methodsmentioning

confidence: 99%

KNOT: Knowledge Distillation using Optimal Transport for Solving NLP Tasks

Bhardwaj¹,

Vaidya²,

Poria³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Enhancing the user experience is an essential task for application service providers. For instance, two users living wide apart may have different tastes of food. A food recommender mobile application installed on an edge device might want to learn from user feedback (reviews) to satisfy the client's needs pertaining to distinct domains. Retrieving user data comes at the cost of privacy while asking for model parameters trained on a user device becomes space inefficient at a large scale. In this work, we propose an approach to learn a central (global) model from the federation of (local) models which are trained on user-devices, without disclosing the local data or model parameters to the server. We propose a federation mechanism for the problems with natural similarity metric between the labels which commonly appear in natural language understanding (NLU) tasks. To learn the global model, the objective is to minimize the optimal transport cost of the global model's predictions from the confident sum of soft-targets assigned by local models. The confidence (a model weighting scheme) score of a model is defined as the L2 distance of a model's prediction from its probability bias. The method improves the global model's performance over the baseline designed on three NLU tasks with intrinsic label space semantics, i.e., fine-grained sentiment analysis, emotion recognition in conversation, and natural language inference. We make our codes public at https://github.com/declare-lab/sinkhorn-loss.

show abstract

Section: Methodsmentioning

confidence: 99%

KNOT: Knowledge Distillation using Optimal Transport for Solving NLP Tasks

Bhardwaj¹,

Vaidya²,

Poria³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In addition, for the contextual information, inspired by the position embedding proposed in Transformer [13], we propose Identity Embedding and add it to the features of each modality, then based on attention mechanism and LSTM [14], the contextual information can be modeled throughout the information flow. The effectiveness of the proposed model is demonstrated by comprehensive experiments on two large and widely used emotional datasets, i.e., the IEMOCAP [15] and the MELD [16]. Our contributions can be summarized as follows:…”

Section: Fusion For Predictionmentioning

confidence: 99%

“…This work evaluates the performance of the proposed algorithm on two different datasets which are widely used in MSA research: IEMOCAP [15] and MELD [16].…”

Section: Datasets and Metricsmentioning

confidence: 99%

Multimodal Representations Learning Based on Mutual Information Maximization and Minimization and Identity Embedding for Multimodal Sentiment Analysis

Zheng¹,

Zhang²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Multimodal sentiment analysis (MSA) is a fundamental complex research problem due to the heterogeneity gap between different modalities and the ambiguity of human emotional expression. Although there have been many successful attempts to construct multimodal representations for MSA, there are still two challenges to be addressed: 1) A more robust multimodal representation needs to be constructed to bridge the heterogeneity gap and cope with the complex multimodal interactions, and 2) the contextual dynamics must be modeled effectively throughout the information flow. In this work, we propose a multimodal representation model based on Mutual information Maximization and Minimization and Identity Embedding (MMMIE). We combine mutual information maximization between modal pairs, and mutual information minimization between input data and corresponding features to mine the modal-invariant and task-related information. Furthermore, Identity Embedding is proposed to prompt the downstream network to perceive the contextual information. Experimental results on two public datasets demonstrate the effectiveness of the proposed model.

show abstract

“…Emotion recognition in conversation is a popular area in NLP. Many ERC datasets have been scripted and annotated in the past few years, such as IEMOCAP (Busso et al 2008), MELD (Poria et al 2018), DailyDialog (Li et al 2017), EmotionLines (Chen et al 2018) and EmoryNLP (Zahiri and Choi 2018). IEMOCAP, MELD, and EmoryNLP are multimodal datasets, containing acoustic, visual and textual information, while the remaining two datasets are textual.…”

Section: Emotion Recognition In Conversationmentioning

confidence: 99%

“…MELD (Poria et al 2018) is a multi-modal emotion classification dataset. It is a multi-party dialogue dataset created from scripts of the Friends TV series.…”

Section: Datasetsmentioning

confidence: 99%

S+PAGE: A Speaker and Position-Aware Graph Neural Network Model for Emotion Recognition in Conversation

Chen¹,

Chong²,

Xu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Emotion recognition in conversation (ERC) has attracted much attention in recent years for its necessity in widespread applications. Existing ERC methods mostly model the self and inter-speaker context separately, posing a major issue for lacking enough interaction between them. In this paper, we propose a novel Speaker and Position-Aware Graph neural network model for ERC (S+PAGE), which contains three stages to combine the benefits of both Transformer and relational graph convolution network (R-GCN) for better contextual modeling. Firstly, a two-stream conversational Transformer is presented to extract the coarse self and inter-speaker contextual features for each utterance. Then, a speaker and position-aware conversation graph is constructed, and we propose an enhanced R-GCN model, called PAG, to refine the coarse features guided by a relative positional encoding. Finally, both of the features from the former two stages are input into a conditional random field layer to model the emotion transfer. Extensive experiments demonstrate that our model achieves state-of-the-art performance on three ERC datasets.

show abstract

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations

Cited by 76 publications

References 0 publications

KNOT: Knowledge Distillation using Optimal Transport for Solving NLP Tasks

KNOT: Knowledge Distillation using Optimal Transport for Solving NLP Tasks

Multimodal Representations Learning Based on Mutual Information Maximization and Minimization and Identity Embedding for Multimodal Sentiment Analysis

S+PAGE: A Speaker and Position-Aware Graph Neural Network Model for Emotion Recognition in Conversation

Contact Info

Product

Resources

About