As a new generation of multimodal systems begins to emerge, one dominant theme will be the integration and synchronization requirements for combining modalities into robust whole systems. In the present research, quantitative modeling is presented on the organization of users' speech and pen multimodal integration patterns. In particular, the potential malleability of users' multimodal integration patterns is explored, as well as variation in these patterns during system error handling and tasks varying in difficulty. Using a new dual-wizard simulation method, data was collected from twelve adults as they interacted with a map-based task using multimodal speech and pen input. Analyses based on over 1600 multimodal constructions revealed that users' dominant multimodal integration pattern was resistant to change, even when strong selective reinforcement was delivered to encourage switching from a sequential to simultaneous integration pattern, or vice versa. Instead, both sequential and simultaneous integrators showed evidence of entrenching further in their dominant integration patterns (i.e., increasing either their intermodal lag or signal overlap) over the course of an interactive session, during system error handling, and when completing increasingly difficult tasks. In fact, during error handling these changes in the co-timing of multimodal signals became the main feature of hyper-clear multimodal language, with elongation of individual signals either attenuated or absent. Whereas Behavioral/Structuralist theory cannot account for these data, it is argued that Gestalt theory provides a valuable framework and insights into multimodal interaction. Implications of these findings are discussed for the development of a coherent theory of multimodal integration during human-computer interaction, and for the design of a new class of adaptive multimodal interfaces.
This research investigates the design and performance of the Speech Graffiti interface for spoken interaction with simple machines. Speech Graffiti is a standardized interface designed to address issues inherent in the current state-of-the-art in spoken dialog systems such as high word-error rates and the difficulty of developing natural language systems. This article describes the general characteristics of Speech Graffiti, provides examples of its use, and describes other aspects of the system such as the development toolkit. We also present results from a user study comparing Speech Graffiti with a natural language dialog system. These results show that users rated Speech Graffiti significantly better in several assessment categories. Participants completed approximately the same number of tasks with both systems, and although Speech Graffiti users often took more turns to complete tasks than natural language interface users, they completed tasks in slightly less time.
Speech-based interfaces offer the promise of simple humancomputer communication, yet the current state-of-the-art often produces inefficient interactions. Many inefficiencies are caused by understanding or recognition errors. Such errors can be minimized by designing interaction protocols in which users are required to speak in a standardized way, but this requirement presents additional difficulties: this way of speaking can be unnatural for users, and in order to learn the standardized interface, users must spend time in tutorial mode rather than in task mode. I propose a strategy of shaping that helps users adapt their interaction to match what the system understands best, thereby reducing the chance for misunderstandings and improving interaction efficiency.
Speech Graffiti is a standardized interaction protocol for spoken dialog systems designed to address some common difficulties with ASR. We have proposed a strategy of shaping to help users adapt their interaction to match what the system understands best, thereby reducing the chance for misunderstandings and improving interaction efficiency. In this paper we report on an evaluation of our initial implementation of shaping in Speech Graffiti, noting that our baseline strategy was not as powerful as expected, and discussing proposed changes to improve its effectiveness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.