Design of Seamless Multi-modal Interaction Framework for Intelligent Virtual Agents in Wearable Mixed Reality Environment

Ghazanfar, Ali; Le, Hong-Quan; Kim, Junho; Hwang, Seung-won; Hwang, Jae-In

doi:10.1145/3328756.3328758

Cited by 22 publications

(20 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The co‐speech gesture system in that framework is rule‐based similar to Reference 14. This article extends the work of Ali et al, 24 with the focus on the approach to automatically generate co‐speech gesture rules from a large‐scale data for ECAs.…”

Section: Introductionmentioning

confidence: 80%

“…Due to the high memory and computation cost, this method is not feasible to run on mobile and wearable devices. For these type of devices, this method can be deployed on cloud 24 …”

Section: Discussionmentioning

confidence: 99%

“…This tree‐like structure gives the flexibility to add new information easily. In fact, chatbots can serve as the backend system for embodied conversation agents (ECAs) and in such a case Text‐to‐Speech (TTS) systems can be used to generate speech from text 24 …”

Section: Introductionmentioning

confidence: 99%

“…The text input based data‐driven approaches suffer from either the small amount of data or laborious manual annotations 17 and also deep‐learning related problems, such as overfitting, vanishing, or exploding gradients. In the framework of Ali et al, 24 an embodied conversation agent uses a chatbot backend system. The co‐speech gesture system in that framework is rule‐based similar to Reference 14.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Automatic text‐to‐gesture rule generation for embodied conversational agents

Ghazanfar

Lee

Hwang

2020

Computer Animation & Virtual

Self Cite

View full text Add to dashboard Cite

Interactions with embodied conversational agents can be enhanced using human-like co-speech gestures. Traditionally, rule-based co-speech gesture mapping has been utilized for this purpose. However, the creation of this mapping is laborious and often requires human experts. Moreover, human-created mapping tends to be limited, therefore prone to generate repeated gestures. In this article, we present an approach to automate the generation of rule-based co-speech gesture mapping from publicly available large video data set without the intervention of human experts. At run-time, word embedding is utilized for rule searching to get the semantic-aware, meaningful, and accurate rule. The evaluation indicated that our method achieved comparable performance with the manual map generated by human experts, with a more variety of gestures activated. Moreover, synergy effects were observed in users' perception of generated co-speech gestures when combined with the manual map.

show abstract

Section: Introductionmentioning

confidence: 80%

“…Due to the high memory and computation cost, this method is not feasible to run on mobile and wearable devices. For these type of devices, this method can be deployed on cloud 24 …”

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Automatic text‐to‐gesture rule generation for embodied conversational agents

Ghazanfar

Lee

Hwang

2020

Computer Animation & Virtual

Self Cite

View full text Add to dashboard Cite

show abstract

“…Users wear AR glasses or use handheld devices such as a smartphone or tablet to see the mixed environment and interact with virtual objects in real-time. Since users can see the real environment (see Figure 1a), AR systems often require an accurate registration of virtual objects to provide seamless interactions in various situations [2,3]. Incorrect registration of a virtual object in the real space can cause unrealistic occlusions [4,5] or physically implausible situations [6,7], leading to perceptual quality degradation and breaks in presence [8].…”

Section: Introductionmentioning

confidence: 99%

Silhouettes from Real Objects Enable Realistic Interactions with a Virtual Human in Mobile Augmented Reality

et al. 2021

Self Cite

View full text Add to dashboard Cite

Realistic interactions with real objects (e.g., animals, toys, robots) in an augmented reality (AR) environment enhances the user experience. The common AR apps on the market achieve realistic interactions by superimposing pre-modeled virtual proxies on the real objects in the AR environment. This way user perceives the interaction with virtual proxies as interaction with real objects. However, catering to environment change, shape deformation, and view update is not a trivial task. Our proposed method uses the dynamic silhouette of a real object to enable realistic interactions. Our approach is practical, lightweight, and requires no additional hardware besides the device camera. For a case study, we designed a mobile AR application to interact with real animal dolls. Our scenario included a virtual human performing four types of realistic interactions. Results demonstrated our method’s stability that does not require pre-modeled virtual proxies in case of shape deformation and view update. We also conducted a pilot study using our approach and reported significant improvements in user perception of spatial awareness and presence for realistic interactions with a virtual human.

show abstract