Benefiting from knowledge of speech, language and hearing ─ accumulated by many researchers over nearly a century ─ new technology is beginning to serve users of complex information systems. This technology aims for a natural communication environment ─ capturing attributes that humans favor in face-to-face exchange. Ideally the environment provides three-dimensional spatial realism in the sensory dimensions of sight, sound and touch. Conversational interaction bears a central burden, with visual and manual signaling simultaneously supplementing the communication process. Current research therefore addresses multimodal interfaces that can transcend the limitations of mouse and keyboard. In addition to instrumenting sensors for each mode, the interface must incorporate context-aware algorithms for fusing and interpreting multiple sensory channels. The ultimate objective is a reliable estimate of user intent, from which actionable responses can be made. This report indicates the currently-early status of multimodal interfaces and identifies emerging opportunities for enhanced usability and naturalness. It concludes by advocating focused research on a frontier issue ─ the formulation of a quantitative language framework for multimodal communication. PERSPECTIVE Over the past few years, society has enjoyed exceptional gains in productivity. A large part of this advance is owing to the benefits of information technology-computing, networking, and software. Processors monotonically achieve greater speeds as costs decline, and broadband transport capacity becomes pervasive. A central issue is how to employ these advantages to maintain the hard-won momentum in productivity. This report argues that an appropriate employment of advanced computing and networking is towards creating a communication environment for human users that is as natural and habitable as face-to-face information exchange. Implied is the (presently unrealistic) ideal of threedimensional spatial realism.