Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413679
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Dialogue Systems via Capturing Context-aware Dependencies of Semantic Elements

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(16 citation statements)
references
References 29 publications
0
16
0
Order By: Relevance
“…In addition, Nie et al [7] devised a multimodal dialog system with multiple decoders, which can generate diverse responses according to the user's intention and adaptively integrate the related knowledge. Recently, some studies have resorted to Transformer [21] to investigate the multimodal dialog systems [8], [9] due to its impressive results in natural language processing (NLP) tasks [10], [11], [12], [22], [23]. For example, He et al [8] introduced a Transformer-based element-level encoder, which can capture the semantic dependencies of multimodal elements (i.e., words and images) via the attention mechanism.…”
Section: Task-oriented Dialog Systemsmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, Nie et al [7] devised a multimodal dialog system with multiple decoders, which can generate diverse responses according to the user's intention and adaptively integrate the related knowledge. Recently, some studies have resorted to Transformer [21] to investigate the multimodal dialog systems [8], [9] due to its impressive results in natural language processing (NLP) tasks [10], [11], [12], [22], [23]. For example, He et al [8] introduced a Transformer-based element-level encoder, which can capture the semantic dependencies of multimodal elements (i.e., words and images) via the attention mechanism.…”
Section: Task-oriented Dialog Systemsmentioning
confidence: 99%
“…Existing multimodal task-oriented dialog systems mainly adopt the encoder-decoder framework for text response generation. In particular, recent studies have recognized the pivotal role of the knowledge base for multimodal dialog systems, and designed various schemes for incorporating knowledge to enhance the user's intention understanding [2], [3], [4], [5], [6], [7], [8], [9]. Although they have achieved significant progress, these research efforts suffer from two key limitations.…”
Section: Introductionmentioning
confidence: 99%
“…It can incorporate different forms of domain knowledge for different intents through intention classification, and generate general responses, knowledge-aware responses, as well as multimodal responses through adaptive decoders. Moreover, combining with transformer [30], He et al [13] advanced a multimodal dialog system via capturing context-aware dependencies of semantic elements (MATE). This model uses relevant images and ordinal information in the dialog history to generate context-aware responses in the text response generation task.…”
Section: Multimodal Dialog Systemsmentioning
confidence: 99%
“…In (Liao et al, 2018), a chat session is modeled as a reinforcement learning procedure, and a reward is formed to optimize the answer. He et al (2020) further consider the influence of the order of historical information images and text information on answers with a self-attention block. Comparatively, we unify the text generation and meme prediction into a long sequence procedure and solve them with a cross-modal GPT-based language model.…”
Section: A2 Technical Difference With Other Multimodal Dialogue Modelsmentioning
confidence: 99%