In this work, we investigate the combination of PGM (Propabilistic Graphical Models) classifiers, either independent or coupled, for the recognition of Arabic handwritten words. The independent classifiers are vertical and horizontal HMMs (Hidden Markov Models) whose observable outputs are features extracted from the image columns and the image rows respectively. The coupled classifiers associate the vertical and horizontal observation streams into a single DBN (Dynamic Bayesian Network). A novel method to extract word baseline and a simple and easily extractable features to construct feature vectors for words in the vocabulary are proposed. Some of these features are statistical, based on pixel distributions and local pixel configurations. Others are structural, based on the presence of ascenders, descenders, loops and diacritic points. Experiments on handwritten Arabic words from IFN/ENIT strongly support the feasibility of the proposed approach. The recognition rates achieve 90.42% with vertical and horizontal HMM, 85.03% and 85.21% with respectively a first and a second DBN which outperform results of some works based on PGMs.
Segmenting arabic manuscripts into text-lines and words is an important step to make recognition systems more efficient and accurate. The major problem making this task crucial is the word extraction process: first, words are often a succession of sub-words where the space value between these sub-words do not respect any rules. Second, the presence of connections even between non adjacent sub-words in the same text-line, makes word's parts identification and the entire word extraction difficult. This work proposes an automatic system for arabic handwritten word extraction and recognition based on 1) localizing and segmenting touching characters, 2) extracting real subwords and structural features from word images and 3) recognizing them by a Markovian classifier. The performance of the proposed system is tested using samples extracted from historical handwritten documents. The obtained results are encouraging. We achieved an average rate of recognition of 87%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.