2022
DOI: 10.1109/taffc.2020.3005660
|View full text |Cite
|
Sign up to set email alerts
|

Adapted Dynamic Memory Network for Emotion Recognition in Conversation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
22
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 47 publications
(22 citation statements)
references
References 59 publications
0
22
0
Order By: Relevance
“…Specifically, Pre-trained 3D-CNN for Human Action [53] is adopted to extract spatio-temporal features of emotions in videos [54] [55]. Furthermore, Pre-trained 3D-CNN for sports [56] are also applied in a series of studies [7] [8] [57] [58] [59] [60]. Resnet-101 [61], as a 3D-CNN architecture pre-trained on the human action video dataset Kinetics [62], is combined with attention mechanism to form the visual models to extract the feature representation of visual stream [63].…”
Section: Preprocessmentioning
confidence: 99%
See 1 more Smart Citation
“…Specifically, Pre-trained 3D-CNN for Human Action [53] is adopted to extract spatio-temporal features of emotions in videos [54] [55]. Furthermore, Pre-trained 3D-CNN for sports [56] are also applied in a series of studies [7] [8] [57] [58] [59] [60]. Resnet-101 [61], as a 3D-CNN architecture pre-trained on the human action video dataset Kinetics [62], is combined with attention mechanism to form the visual models to extract the feature representation of visual stream [63].…”
Section: Preprocessmentioning
confidence: 99%
“…Considering a scene that intelligent robots in customer systems are demanded to recognize emotions of customer after customer speaks in a dialogue, two or more parties existed in the dialogue and a party can be influenced by either its own state in the past or the states of other parties. As a result, a series of novel methods are proposed for ERC [4] [5] [6] [7] [8] [9].…”
Section: Introductionmentioning
confidence: 99%
“…Specifically, Pre-trained 3D-CNN for Human Action is adopted to extract spatiotemporal features of emotions in videos [52][53]. Furthermore, Pre-trained 3D-CNN for sports are also applied in a series of studies [7] [58], which consists of two stages: Two-Stream convolutional Neural Network and Gated Recurrent Unit Network to capture both micro-and macro-motion, respectively. The feature representation of a snippet, i.e.…”
Section: ) Preprocessmentioning
confidence: 99%
“…[6] propose a model for real-time emotion detection in conversations and experiment in bimodality and multimodality data but lack of the unimodality. [7] and [87] evaluate the performances of their models using different modality combinations but lack the comparison against baselines in unimodality and bimodality. Liang et al [89] evaluate their model and empirical analysis is presented in unimodality, bimodality and mulmodality.…”
Section: Unified Modelmentioning
confidence: 99%
“…Considering a scene that intelligent robots in customer systems are demanded to recognize emotions of customer after customer speaks in a dialogue, two or more parties exist in the dialogue and a party can be influenced by either its own state in the past or the states of other parties. As a result, a series of novel methods are proposed for ERC [4][5] [6] [7][8] [9].…”
Section: Introductionmentioning
confidence: 99%