Abstract. In recognizing cursive scripts, a major undertaking is segmenting cursive words into characters and isolating merged characters. The segmentation is usually the pivotal stage in the system to which a sizable portion of processing is devoted and a considerable share of recognition errors is attributed. The most notable feature of Arabic writing is its cursiveness. Compared to other features, the cursiveness of Arabic words poses the most difficult problem for recognition algorithms. In this work, we describe the design and implementation of an Arabic word recognition system. To recognize a word, the system does not segment it into characters in advance; rather, it recognizes the input word by detecting a set of "shape primitives" on the word. It then matches the regions of the word (represented by the detected primitives) with a set of symbol models. A spatial arrangement of symbol models that are matched to regions of the word, then, becomes the description of the recognized word. Since the number of potential arrangements of all symbol models is combinatorially large, the system imposes a set of constraints that pertain to word structure and spatial consistency. The system searches the space made up of the arrangements that satisfy the constraints, and tries to maximize the a posteriori probability of the arrangement of symbol models. We measure the accuracy of the system not only on words but on isolated characters as well. For isolated characters, it has a recognition rate of 99.7% for synthetically degraded symbols and 94.1% for scanned symbols. For isolated words the system has a recognition rate of 99.4% for noise-free words, 95.6% for synthetically degraded words, and 73% for scanned words.
This paper describes the desagn and amplementataon of a system that recognazes machzne-pranted Arabic words without praor segmentataon. The technaque as based on describang symbols an terms of shape praniatives. A t recognataon tame, the pramataves are detected on a word amage usang mathematacal morphology operataons. The system then matches the detected priniataves wath symbol models. Thas leads to a spataal arrangement of matched symbol models. The system conducts a search an the space of spataal arrangements of models and outputs the arrangement wath the liaghest posterior probabilaty as the recognataon of the word.The advantage of usang thas whole word approach versus a segmentataon approach as that the result of recognation is optimazed wath regard to the whole word. Results of preliminary experiments usang a letacon of 42,OOO words show a recognataon rate of 99.4% for noise-free text and 73% for scanned tez.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.