In a multimodal conversation, the way users communicate with a system depends on the available interaction channels and the situated context (e.g., conversation focus, visual feedback). These dependencies form a rich set of constraints from various perspectives such as temporal alignments between different modalities, coherence of conversation, and the domain semantics. There is strong evidence that competition and ranking of these constraints is important to achieve an optimal interpretation. Thus, we have developed an optimization approach for multimodal interpretation, particularly for interpreting multimodal references. A preliminary evaluation indicates the effectiveness of this approach, especially for complex user inputs that involve multiple referring expressions in a speech utterance and multiple gestures.
Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech and gesture. To build effective multimodal interfaces, automated interpretation of user multimodal inputs is important. Inspired by the previous investigation on cognitive status in multimodal human machine interaction, we have developed a greedy algorithm for interpreting user referring expressions (i.e., multimodal reference resolution). This algorithm incorporates the cognitive principles of Conversational Implicature and Givenness Hierarchy and applies constraints from various sources (e.g., temporal, semantic, and contextual) to resolve references. Our empirical results have shown the advantage of this algorithm in efficiently resolving a variety of user references. Because of its simplicity and generality, this approach has the potential to improve the robustness of multimodal input interpretation.
A re-configurable, portable test station was developed for integrating and testing real-time performance metrics for continuously assessing operator effectiveness in operationally-relevant spaceflight piloting tasks. The test station includes a single computer for hosting the vehicle simulation, rendering both graphical flight displays and a 3-D out-the-window view, and computing the performance metrics in real-time. The pilot interacts with the simulation using four displays (two piloting displays, one out-the-window display, and a mission summary display), a rotational hand controller, a translational hand controller, and a microphone. A fifth display provides a system status I engineering view for the experimenter. A key component of the simulation station is the real-time metrics engine and algorithms, which estimates pilot workload, situation awareness, and flight performance without interfering with the piloting task, or adding equipment or infrastructure to the flight deck. Workload and flight performance are estimated based on an analysis of the vehicle state (e.g., attitude, altitude, % fuel) and the pilot commands (e.g., hand controller movement), whereas situation awareness is estimated based on the comparison of the actual vehicle state and that spoken (and converted to text through an automatic speech recognition algorithm) by the flying pilot. This real time simulation station development is discussed in the context of four operationally-relevant spaceflight tasks: piloted lunar landing, Orion/MPCV docking operations with the International Space Station (ISS), and manual control of the spacewalking Simplified Aid for EVA Rescue (SAFER) jet pack near the ISS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.