Emotion-Aware Multimodal Pre-training for Image-Grounded Emotional Response Generation

Tian, Zhiliang; Wen, Zhuoer; Wu, Zhenghao; Song, Yiping; Tang, Jintao; Li, Dongsheng; Zhang, Nevin L.

doi:10.1007/978-3-031-00129-1_1

Cited by 4 publications

(2 citation statements)

References 55 publications

(82 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In their work, emotions were classifed into two broad categories, namely, positive and negative, to facilitate a simplifed emotional understanding. In a distinct study, Tian et al [10] put forth a multitask learning framework in which tasks such as image sentiment sequential labeling, image sentiment classifcation, and text generation were learned simultaneously. Tis was accomplished using a pretrained model specifcally designed to generate textual content that efectively captures the user's emotions.…”

Section: Emotional Dialogue Systemmentioning

confidence: 99%

“…In the realm of academia, researchers have extensively investigated dialogue models, such as those presented in Shuster et al [4,5], and have proposed emotion-enhanced models, as discussed in Wei et al [6] and Li et al [7]. Specifcally, to address the limitations of single text generation models, multimodal dialogue models capable of processing both textual and video information have been proposed, including the works of Fung et al [8], Huber et al [9], and Tian et al [10]. More importantly, Shen et al [11] designed ViDA-MAN, a digital human agent for multimodal interaction, which provides real-time audiovisual responses to users through voice queries.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Beyond Words: An Intelligent Human-Machine Dialogue System with Multimodal Generation and Emotional Comprehension

Zhao,

Cheng,

Huang

et al. 2023

International Journal of Intelligent Systems

View full text Add to dashboard Cite

Intelligent service robots have become an indispensable aspect of modern-day society, playing a crucial role in various domains ranging from healthcare to hospitality. Among these robotic systems, human-machine dialogue systems are particularly noteworthy as they deliver both auditory and visual services to users, effectively bridging the communication gap between humans and machines. Despite their utility, the majority of existing approaches to these systems primarily concentrate on augmenting the logical coherence of the system’s responses, inadvertently neglecting the significance of user emotions in shaping a comprehensive communication experience. To tackle this shortcoming, we propose the development of an innovative human-machine dialogue system that is both intelligent and emotionally sensitive, employing multimodal generation techniques. This system is architecturally comprised of three components: (1) data collection and processing, responsible for gathering and preparing relevant information, (2) a dialogue engine, which generates contextually appropriate responses, and (3) an interaction module, responsible for facilitating the communication interface between users and the system. To validate our proposed approach, we have constructed a prototype system and conducted an evaluation of the performance of the core dialogue engine by utilizing an open dataset. The results of our study indicate that our system demonstrates a remarkable level of multimodal generation response, ultimately offering a more human-like dialogue experience.

show abstract

Section: Emotional Dialogue Systemmentioning

confidence: 99%