This paper presents two passive pointing systems for a distant screen based on an acoustic position estimation technology. These systems are designed to interact with a distant screen such as a television set at home or digital signage in public as an alternative to a touch screen. The first system consists of a distant screen, three loudspeakers set around the screen, and two microphones as a pointing device. The second system consists of a distant screen, two loudspeakers set around the screen, and a smartphone equipping a microphone and a gravity sensor inside as a pointing device. The position of the pointer on the screen is theoretically determined by the position and direction of the pointing device in the space. The second system approximates the position and direction by the two-dimensional position of the microphone horizontally and the pitch angle from the gravity sensor vertically. In this paper, we report experiments to evaluate the performance of these systems. The loudspeakers of these systems radiate burst signal from 18 to 24 kHz. The position of the microphone is estimated at a frame rate of 15 frames per second with a latency of 0.4 s. The accuracy of the pointer was measured as an angle error below 10 degrees for 100% of all frames. We confirmed that it has enough accuracy to point to one of several partitioned areas in the screen.
Multimodal representation of conversational agents requires accurate synchronization of gesture and speech. For this purpose, we investigate the important issues in synchronization as a practical guideline for our algorithm design through a precedent case study and propose a two-step synchronization approach. Our case study reveals that two issues (i.e. duration and timing) play an important role in the manual synchronizing of gesture with speech. Considering the synchronization problem as a motion synthesis problem instead of a behavior scheduling problem used in the conventional methods, we use a motion graph technique with constraints on gesture structure for coarse synchronization in a first step and refine this further by shifting and scaling the motion in a second step. This approach can successfully synchronize gesture and speech with respect to both duration and timing. We have confirmed that our system makes the creation of attractive content easier than manual creation of equal quality. In addition, subjective evaluation has demonstrated that the proposed approach achieves more accurate synchronization and higher motion quality than the state-of-the-art method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.