Abstract-In theory, autonomous robotic swarms can be used for critical Army tasks, including accompanying vehicle convoys to provide security and enhance situational awareness. However, the Soldier providing swarm supervisory control must be able to correct swarm actions, especially in disrupted or degraded conditions. Dynamic map displays are visual interfaces that can be useful for swarm supervisory control tasks, because they can show the spatial positions of objects of interest (e.g., people, robots, swarm members, and vehicles), at different locations (e.g., on roads and intersections), while allowing user commands as well as world changes, often in real time. In this study, multimodal speech and touch controls were designed for a U.S. Army Research Laboratory dynamic map display to allow users to provide supervisory control of a simulated robotic swarm. This experiment explored the use of sequential multimodal touch and speech commands for placement of swarm-related map objects at different map locations. The criterion variable was temporal binding, the time between the onset of each command in the sequence, relative to the system's ability to fuse the two sequential commands into a unitary response. User preference of modality for the first command was also measured. These concepts were tested in a laboratory study using 12 male Marine volunteers with a mean age of 19 years. Results indicated significant differences in temporal binding for different map objects and map locations. Additionally, nine out of 12 Marines used speech commands approximately 75% or more of the time, while the remaining three Marines used touch commands first approximately 75% or more of the time. Temporal binding was significantly shorter for touch-first than for speech-first commands.Suggestions for future research and future applications to robotic command and control systems are described.