2018
DOI: 10.1609/aaai.v32i1.12024
|View full text |Cite
|
Sign up to set email alerts
|

Multi-attention Recurrent Network for Human Communication Comprehension

Abstract: Human face-to-face communication is a complex multimodal signal. We use words (language modality), gestures (vision modality) and changes in tone (acoustic modality) to convey our intentions. Humans easily process and understand face-to-face communication, however, comprehending this form of communication remains a significant challenge for Artificial Intelligence (AI). AI must understand each modality and the interactions between them that shape the communication. In this paper, we present a novel neural arch… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
82
0
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 326 publications
(83 citation statements)
references
References 20 publications
0
82
0
1
Order By: Relevance
“…et al (2017) propose the TFN model, which is a multi-modal method using the tensor outer product. Liang et al (2018) propose the model which use a multi-level attention mechanism to extract different modal interaction information. Cai et al (2019) propose a hierarchical fusion model to model graphic information for irony recognition.…”
Section: So We Propose a New Multi-modal Emotionalmentioning
confidence: 99%
“…et al (2017) propose the TFN model, which is a multi-modal method using the tensor outer product. Liang et al (2018) propose the model which use a multi-level attention mechanism to extract different modal interaction information. Cai et al (2019) propose a hierarchical fusion model to model graphic information for irony recognition.…”
Section: So We Propose a New Multi-modal Emotionalmentioning
confidence: 99%
“…Our proposed method outperforms all three baselines, obtaining 3.54% and 3.25% absolute improvement on WA and UA respectively over the prior state of the art. Sentiment Analysis We adopt CMU-MOSEI (Zadeh et al 2018) dataset to evaluate the sentiment analysis task, which aims to predict the degree of positive and negative sentiment. We follow the same experimental protocol as MuIT (Tsai et al 2019), with the same train/test data split.…”
Section: Fine-tuning On Downstream Tasksmentioning
confidence: 99%
“…CMU-MOSI (Wöllmer et al, 2013) is sentiment prediction taks on a set of short youtube video clips. CMU-MOSEI (Zadeh et al, 2018b) is a similar dataset consisting of around 23k review videos taken from YouTube. The output in both cases is a sentiment score in [−3, 3].…”
Section: Datasetsmentioning
confidence: 99%