Many text recognition systems recognize text imagery at the character level and assemble words from the recognized characters. An alternative approach is to recognize text imagery at the word level, without analyzing individual characters.This approach avoids the problem of individual character segmentation, and can overcome local errors in character recognition.A word-level recognition system for machine-printed Arabic text has been implemented. Arabic is a script language, and is therefore difficult to segment at the character level. Character segmentation has been avoided by recognizing text imagery of complete words. The Arabic recognition system computes a vector of image-morphological features on a query word image. This vector is matched against a precomputed database of vectors from a lexicon of Arabic words. Vectors from the database with the highest match score are returned as hypotheses for the unknown image. Several feature vectors may be stored for each word in the database. Database feature vectors generated using multiple fonts and noise models allow the system to be tuned to its input stream.Used in conjunction with database pruning techniques, this Arabic recognition system has obtained promising word recognition rates on low-quality multifont text imagery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.