Techniques for information fusion are at the heart of multimodal system design. To develop new user-adaptive approaches for multimodal fusion, the present research investigated the stability and underlying cause of major individual differences that have been documented between users in their multimodal integration pattern. Longitudinal data were collected from 25 adults as they interacted with a map system over six weeks. Analyses of 1,100 multimodal constructions revealed that everyone had a dominant integration pattern, either simultaneous or sequential, which was 95-96% consistent and remained stable over time. In addition, coherent behavioral and linguistic differences were identified between these two groups. Whereas performance speed was comparable, sequential integrators made only half as many errors and excelled during new or complex tasks. Sequential integrators also had more precise articulation (e.g., fewer disfluencies), although their speech rate was no slower. Finally, sequential integrators more often adopted terse and direct command-style language, with a smaller and less varied vocabulary, which appeared focused on achieving error-free communication. These distinct interaction patterns are interpreted as deriving from fundamental differences in reflective-impulsive cognitive style. Implications of these findings are discussed for the design of adaptive multimodal systems with substantially improved performance characteristics.
Multimodal interfaces are designed with a focus on flexibility, although very few currently are capable of adapting to major sources of user, task, or environmental variation. The development of adaptive multimodal processing techniques will require empirical guidance from quantitative modeling on key aspects of individual differences, especially as users engage in different types of tasks in different usage contexts. In the present study, data were collected from fifteen 66-to 86-year-old healthy seniors as they interacted with a map-based flood management system using multimodal speech and pen input. A comprehensive analysis of multimodal integration patterns revealed that seniors were classifiable as either simultaneous or sequential integrators, like children and adults. Seniors also demonstrated early predictability and a high degree of consistency in their dominant integration pattern. However, greater individual differences in multimodal integration generally were evident in this population. Perhaps surprisingly, during sequential constructions seniors' intermodal lags were no longer in average and maximum duration than those of younger adults, although both of these groups had longer maximum lags than children. However, an analysis of seniors' performance did reveal lengthy latencies before initiating a task, and high rates of self talk and task-critical errors while completing spatial tasks. All of these behaviors were magnified as the task difficulty level increased. Results of this research have implications for the design of adaptive processing strategies appropriate for seniors' applications, especially for the development of temporal thresholds used during multimodal fusion. The long-term goal of this research is the design of highperformance multimodal systems that adapt to a full spectrum of diverse users, supporting tailored and robust future systems.
Mobile usage patterns often entail high and fluctuating levels of difficulty as well as dual tasking. One major theme explored in this research is whether a flexible multimodal interface supports users in managing cognitive load. Findings from this study reveal that multimodal interface users spontaneously respond to dynamic changes in their own cognitive load by shifting to multimodal communication as load increases with task difficulty and communicative complexity. Given a flexible multimodal interface, users' ratio of multimodal (versus unimodal) interaction increased substantially from 18.6% when referring to established dialogue context to 77.1% when required to establish a new context, a +315% relative increase. Likewise, the ratio of users' multimodal interaction increased significantly as the tasks became more difficult, from 59.2% during low difficulty tasks, to 65.5% at moderate difficulty, 68.2% at high and 75.0% at very high difficulty, an overall relative increase of +27%. Analysis of users' task-critical errors and response latencies across task difficulty levels increased systematically and significantly as well, corroborating the manipulation of cognitive processing load. The adaptations seen in this study reflect users' efforts to self-manage limitations on working memory when task complexity increases. This is accomplished by distributing communicative information across multiple modalities, which is compatible with a cognitive load theory of multimodal interaction. The long-term goal of this research is the development of an empirical foundation for proactively guiding flexible and adaptive multimodal system design.
As a new generation of multimodal systems begins to emerge, one dominant theme will be the integration and synchronization requirements for combining modalities into robust whole systems. In the present research, quantitative modeling is presented on the organization of users' speech and pen multimodal integration patterns. In particular, the potential malleability of users' multimodal integration patterns is explored, as well as variation in these patterns during system error handling and tasks varying in difficulty. Using a new dual-wizard simulation method, data was collected from twelve adults as they interacted with a map-based task using multimodal speech and pen input. Analyses based on over 1600 multimodal constructions revealed that users' dominant multimodal integration pattern was resistant to change, even when strong selective reinforcement was delivered to encourage switching from a sequential to simultaneous integration pattern, or vice versa. Instead, both sequential and simultaneous integrators showed evidence of entrenching further in their dominant integration patterns (i.e., increasing either their intermodal lag or signal overlap) over the course of an interactive session, during system error handling, and when completing increasingly difficult tasks. In fact, during error handling these changes in the co-timing of multimodal signals became the main feature of hyper-clear multimodal language, with elongation of individual signals either attenuated or absent. Whereas Behavioral/Structuralist theory cannot account for these data, it is argued that Gestalt theory provides a valuable framework and insights into multimodal interaction. Implications of these findings are discussed for the development of a coherent theory of multimodal integration during human-computer interaction, and for the design of a new class of adaptive multimodal interfaces.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.