Typing on a touchscreen keyboard is very difficult without being able to see the keyboard. We propose a new approach in which users imagine a Qwerty keyboard somewhere on the device and tap out an entire sentence without any visual reference to the keyboard and without intermediate feed back about the letters or words typed. To demonstrate the feasibility of our approach, we developed an algorithm that decodes blind touchscreen typing with a character error rate of 18.5%. Our decoder currently uses three components: a model of the keyboard topology and tap variability, a point transformation algorithm, and a long-span statistical lan guage model. Our initial results demonstrate that our pro posed method provides fast entry rates and promising error rates. On one-third of the sentences, novices' highly noisy input was successfully decoded with no errors.
Categories and Subject Descriptors
MOTIVATION AND APPROACHEntering text on a touchscreen mobile device typically in volves visually-guided tapping on a Qwerty keyboard. For users who are blind, visually-impaired, or using a device eyes-free, such visually-guided tapping is difficult or impos sible. Existing approaches are slow (e.g. the split-tapping method of the iPhone's VoiceOver feature), require chorded Braille input (e.g. Perkinput [1], BrailleTouch [3]), or require word-at-a-time confirmation and correction (e.g. the Fleksy iPhone/Android app by Syntellia).Rather than designing a letter-or word-at-a-time recogni tion interface, we present initial results on an approach in which recognition is postponed until an entire sentence of noisy tap data is collected. This may improve users' effi ciency by avoiding the distraction of intermediate letter-or word-level recognition results. Users enter a whole sequence of taps on a keyboard they imagine somewhere on the screen but cannot actually see. We then decode the user's entire intended sentence from the imprecise tap data. Our recog nizer searches for the most likely character sequence under a probabilistic keyboard and language model.The keyboard model places a 2D Gaussian with a diagonal covariance matrix on each key. For each tap, the model pro duces a likelihood for each of the possible letters on the keyboard with higher likelihoods for letters closer to the tap's location. Our 9-gram character language model uses Witten-Bell smoothing and was trained on billions of words of Twitter, Usenet and blog data. The language model has 9.8 M parameters and a compressed disk size of 67 MB.Since users are imagining the keyboard's location and size, their actual tap locations are unlikely to correspond well with any fixed keyboard location. We compensate for this by geometrically transforming the tap points as shown in Figure 1. We allow taps to be scaled along the x-and y-dimensions, translated horizontally and vertically, and rotated by up to 20 degrees. We also search for two multiplicative factors that adjust the x-and y-variance of the 2D Gaussians.Our current decoder operates offline, finding the best ...