When using motion gestures, 3D movements of a mobile phone, as an input modality, one significant challenge is how to teach end users the movement parameters necessary to successfully issue a command. Is a simple video or image depicting movement of a smartphone sufficient? Or do we need three-dimensional depictions of movement on external screens to train users? In this paper, we explore mechanisms to teach end users motion gestures, examining two factors. The first factor is how to represent motion gestures: as icons that describe movement, video that depicts movement using the smartphone screen, or a Kinect-based teaching mechanism that captures and depicts the gesture on an external display in three-dimensional space. The second factor we explore is recognizer feedback, i.e. a simple representation of the proximity of a motion gesture to the desired motion gesture based on a distance metric extracted from the recognizer. We show that, by combining video with recognizer feedback, participants master motion gestures equally quickly as end users that learn using a Kinect. These results demonstrate the viability of training end users to perform motion gestures using only the smartphone display.
This paper describes a method for determining an object's pose given its 3D model and a 2D view. This 2D-3D registration problem arises in a number of medical applications, e.g. image guided spine procedures. Previous approaches often rely on a good initial estimate of the pose parameters and an optimization procedure t o r e ne this initial pose estimate, e.g. the iterative closest point (ICP). However, such algorithms can identify local minima as global minima, leading to registration errors, if the initial pose is not carefully chosen. The speci cation of the appropriate initial conditions, however requires user interaction and is time consuming. We pr opose an approach where sample 2D views are generated f r om the 3D model and matched against the given view (2D-3D registration). Additional views are then generated in the vicinity of the best view and the procedure i s r epeated until convergence. Results of estimating the coordinates of a vertebrae spine bone from its 3D model, obtained f r om volumetric (CT or MR) data, and a 2D view, as might be obtained f r om uoroscopic data, demonstrates that the pose can be reliably obtained without requiring extensive user interface.
Bi-level thresholding is a motion gesture recognition technique that mediates between false positives, and false negatives by using two threshold levels: a tighter threshold that limits false positives and recognition errors, and a looser threshold that prevents repeated errors (false negatives) by analyzing movements in sequence. In this paper, we examine the effects of bi-level thresholding on the workload and acceptance of endusers. Using a wizard-of-Oz recognizer, we hold recognition rates constant and adjust for fixed versus bi-level thresholding. Given identical recognition rates, we show that systems using bi-level thresholding result in significant lower workload scores on the NASA-TLX and accelerometer variance. Overall, these results argue for the viability of bi-level thresholding as an effective technique for balancing between false positives, recognition errors and false negatives.
In gesture recognition, one challenge that researchers and developers face is the need for recognition strategies that mediate between false positives and false negatives. In this article, we examine bi-level thresholding, a recognition strategy that uses two thresholds: a tighter threshold limits false positives and recognition errors, and a looser threshold prevents repeated errors (false negatives) by analyzing movements in sequence. We first describe early observations that led to the development of the bi-level thresholding algorithm. Next, using a Wizard-of-Oz recognizer, we hold recognition rates constant and adjust for fixed versus bi-level thresholding; we show that systems using bi-level thresholding result in significantly lower workload scores on the NASA-TLX and significantly lower accelerometer variance when performing gesture input. Finally, we examine the effect that bi-level thresholding has on a real-world dataset of wrist and finger gestures, showing an ability to significantly improve measures of precision and recall. Overall, these results argue for the viability of bi-level thresholding as an effective technique for balancing between false positives, recognition errors, and false negatives.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.