Proceedings of the 4th Workshop on NLP for Conversational AI 2022
DOI: 10.18653/v1/2022.nlp4convai-1.12
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Conversational AI: A Survey of Datasets and Approaches

Abstract: As humans, we experience the world with all our senses or modalities (sound, sight, touch, smell, and taste). We use these modalities, particularly sight and touch, to convey and interpret specific meanings. Multimodal expressions are central to conversations; a rich set of modalities amplify and often compensate for each other. A multimodal conversational AI system answers questions, fulfills tasks, and emulates human conversations by understanding and expressing itself via multiple modalities. This paper mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(9 citation statements)
references
References 131 publications
0
9
0
Order By: Relevance
“…hallucinate facts is well documented [24]. As they are primarily text-based, their integration with non-linguistic knowledge sources may also be challenging, although there exists a number of approaches integrating neural models with knowledge bases [17,20] or image/videos [32,49]. Finally, even though few-shots learning approaches may be employed to mitigate problems of data scarcity [42,54], their portability to scenarios with no or limited data remains difficult.…”
Section: Modular Vs End-to-end Sdsmentioning
confidence: 99%
“…hallucinate facts is well documented [24]. As they are primarily text-based, their integration with non-linguistic knowledge sources may also be challenging, although there exists a number of approaches integrating neural models with knowledge bases [17,20] or image/videos [32,49]. Finally, even though few-shots learning approaches may be employed to mitigate problems of data scarcity [42,54], their portability to scenarios with no or limited data remains difficult.…”
Section: Modular Vs End-to-end Sdsmentioning
confidence: 99%
“…So, unexploited potential exists in the study of multimodal conversational agents, which let users and conversational agents converse using both human language and visual information to be more realistic, human-like, and engaging. Sunder and Heck [10] has defined and mathematically formulated the goal of the multimodal conversational study. they suggested four basic problems in multimodal conversational systems: disambiguation, response generation, coreference resolution, and dialogue state tracking.…”
Section: Emoji Representation and Approaches Plays A Key Role In Mult...mentioning
confidence: 99%
“…So, they cannot understand the mood or tone of the user [9]. One more drawback of current conversational agents is that they converse only with language (text) whereas humans communicate with different modalities or senses [10]. Figure 1 shows an overview of the current state of conversational agents.…”
Section: Introductionmentioning
confidence: 99%
“…At the core of these efforts, the ability to understand language and vision, as well as integrate both representations to align the linguistic expressions in the dialogue with the relevant visual concepts or perceived objects, is the key to multimodal dialogue understanding (Landragin, 2006;Loáiciga et al, 2021b,a;Kottur et al, 2018;Utescher and Zarrieß, 2021;Sundar and Heck, 2022;Dai et al, 2021).…”
Section: Introductionmentioning
confidence: 99%