Dynamic map displays are visual interfaces that show the spatial positions of objects of interest (e.g., people, robots, vehicles), and can be updated with user commands as well as world changes, often in real time. Multimodal (speech and touch) controls were designed for a U.S. Army Research Laboratory dynamic map display to allow users to provide supervisory control of a simulated robotic swarm. This study characterized the effects of user performance (input difficulty, modality preference, and response to different levels of workload) on multimodal intercommand time (i.e., temporal binding), and explored how this might relate to the system's ability to bind or fuse user multimodal inputs into a unitary response. User performance was tested in a laboratory study using 6 male and 6 female volunteers with a mean age of 26 years. Results showed that 64% of all participants used speech commands first 100% of the time, while the remaining used touch commands first 100% of the time. Temporal binding between touch and voice commands was significantly shorter for touch-first than for speech-first commands, no matter what the level of workload. For both speech and touch commands, temporal binding was significantly shorter for both roads and swarm edges than for intersections. Results indicated that all of these factors can be significant in relating to a system's ability to bind multimodal inputs into a unitary response. Suggestions for future research are described.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.