SUMMARYThe paper describes how a robust and compact on-line handwritten Japanese text recognizer was developed by compressing each component of an integrated text recognition system including a SVM classifier to evaluate segmentation points, an on-line and off-line combined character recognizer, a linguistic context processor, and a geometric context evaluation module to deploy it on hand-held devices. Selecting an elasticmatching based on-line recognizer and compressing MQDF2 via a combination of LDA, vector quantization and data type transformation, have contributed to building a remarkably small yet robust recognizer. The compact text recognizer covering 7,097 character classes just requires about 15 MB memory to keep 93.11% accuracy on horizontal text lines extracted from the TUAT Kondate database. Compared with the original full-scale Japanese text recognizer, the memory size is reduced from 64.1 MB to 14.9 MB while the accuracy loss is only 0.5% from 93.6% to 93.11%. The method is scalable so even systems of less than 11 MB or less than 6 MB still remain 92.80% or 90.02% accuracy, respectively. key words: on-line recognition, handwritten text recognition, elastic matching, MQDF, vector quantization
IntroductonWith the development of pen-based or touch-based handheld devices, handwritten text recognition system running on such hand-held devices needs to be developed. The relatively small RAM of a hand-held device requires a handwritten text recognition system as small as possible that maintains high accuracy. This paper focuses on constructing a compact on-line handwritten Japanese text recognition system, running on such devices.Handwritten character recognition mainly includes two types of methods: on-line and off-line recognition. The online method works on an input on-line handwritten character pattern, which is a time-sequence of pen-tip coordinates, and although it is easily made robust to stroke connection and deformation, it is sensitive to stroke order variation. On the other hand, the off-line method recognizes an off-line pattern, which is a character pattern image, and although it is insensitive to stroke order variation or duplicated strokes, it is not very robust to stroke connection and deformation. To overcome the disadvantage of the on-line method, the off-line recognition method is combined with the on-line method to form a combined recognizer since the off-line method is made applicable to an on-line pattern by discard- [3]. Moreover, Japanese is a large character set language that uses thousands of ideographic characters of Chinese origin, two sets of phonetic characters, alpha, numerics and symbols, so designing a compact yet robust text recognition system running on handheld devices is challenging. We need to compress each component in the text recognizer while keeping high accuracy.For Japanese and Chinese off-line handwritten character recognition, MQDF2 [4] has been widely used, but its performance depends on two parameters: the size of feature vector dimensions and the number of p...