Strong and Simple Baselines for Multimodal Utterance Embeddings

Liang, Paul Pu; Lim, Yao Chong; Tsai, Yao-Hung Hubert; Salakhutdinov, Ruslan; Morency, Louis–Philippe

doi:10.18653/v1/n19-1267

Cited by 28 publications

(18 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our text corpora originate from the following five sources: 1) WikiText-2 (Merity et al, 2017a), a dataset of formally written Wikipedia articles (we only use the first 10% of WikiText-2 which we found to be sufficient to capture formally written text), 2) Stanford Sentiment Treebank (Socher et al, 2013), a collection of 10000 polarized written movie reviews, 3) Reddit data collected from discussion forums related to politics, electronics, and relationships, 4) MELD (Poria et al, 2019), a large-scale multimodal multi-party emotional dialog dataset collected from the TV-series Friends, and 5) POM (Park et al, 2014), a dataset of spoken review videos collected across 1,000 individuals spanning multiple topics. These datasets have been the subject of recent research in language understanding (Merity et al, 2017b;Liu et al, 2019; and multimodal human language (Liang et al, 2018(Liang et al, , 2019. Table 2 summarizes these datasets.…”

Section: Sent-debiasmentioning

confidence: 99%

Towards Debiasing Sentence Representations

Liang¹,

Li²,

Zheng³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

As natural language processing methods are increasingly deployed in real-world scenarios such as healthcare, legal systems, and social science, it becomes necessary to recognize the role they potentially play in shaping social biases and stereotypes. Previous work has revealed the presence of social biases in widely used word embeddings involving gender, race, religion, and other social constructs. While some methods were proposed to debias these word-level embeddings, there is a need to perform debiasing at the sentence-level given the recent shift towards new contextualized sentence representations such as ELMo and BERT. In this paper, we investigate the presence of social biases in sentence-level representations and propose a new method, SENT-DEBIAS, to reduce these biases. We show that SENT-DEBIAS is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks such as sentiment analysis, linguistic acceptability, and natural language understanding. We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.

show abstract

Section: Sent-debiasmentioning

confidence: 99%

Towards Debiasing Sentence Representations

Liang¹,

Li²,

Zheng³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…There exist various exciting recent work on improved multimodal fusion techniques Liang et al, 2019a;Pham et al, 2019;Baltrušaitis et al, 2019). In addition to the simplified feature and modality concatenations, we plan to explore some of these promising tensor-based multimodal fusion networks (Liu et al, 2018;Liang et al, 2019b;Tsai et al, 2019) for more robust intent classification on AMIE dataset as future work.…”

Section: Discussionmentioning

confidence: 99%

Audio-Visual Understanding of Passenger Intents for In-Cabin Conversational Agents

Okur¹,

Kumar²,

Sahay³

et al. 2020

Second Grand-Challenge and Workshop on Multimodal Language (Challenge-Hml)

View full text Add to dashboard Cite

Building multimodal dialogue understanding capabilities situated in the in-cabin context is crucial to enhance passenger comfort in autonomous vehicle (AV) interaction systems. To this end, understanding passenger intents from spoken interactions and vehicle vision systems is an important building block for developing contextual and visually grounded conversational agents for AV. Towards this goal, we explore AMIE (Automated-vehicle Multimodal In-cabin Experience), the in-cabin agent responsible for handling multimodal passenger-vehicle interactions. In this work, we discuss the benefits of multimodal understanding of in-cabin utterances by incorporating verbal/language input together with the non-verbal/acoustic and visual input from inside and outside the vehicle. Our experimental results outperformed text-only baselines as we achieved improved performances for intent detection with multimodal approach.

show abstract

Section: Discussionmentioning

confidence: 99%

Second Grand-Challenge and Workshop on Multimodal Language (Challenge-HML)

Zadeh¹,

Morency²,

Liang³

et al. 2020

View full text Add to dashboard Cite

Understanding expressed sentiment and emotions are two crucial factors in human multimodal language. This paper describes a Transformer-based joint-encoding (TBJE) for the task of Emotion Recognition and Sentiment Analysis. In addition to use the Transformer architecture, our approach relies on a modular co-attention and a glimpse layer to jointly encode one or more modalities. The proposed solution has also been submitted to the ACL20: Second Grand-Challenge on Multimodal Language to be evaluated on the CMU-MOSEI dataset. The code to replicate the presented experiments is open-source 1 .

show abstract

Strong and Simple Baselines for Multimodal Utterance Embeddings

Cited by 28 publications

References 44 publications

Towards Debiasing Sentence Representations

Towards Debiasing Sentence Representations

Audio-Visual Understanding of Passenger Intents for In-Cabin Conversational Agents

Second Grand-Challenge and Workshop on Multimodal Language (Challenge-HML)

Contact Info

Product

Resources

About