In this paper we present a detailed review of current script and language identification techniques. The main criticism of the existing techniques is that most of them rely on either connected component analysis or character segmentation. We go on to present a new method based on texture analysis for script identijication which does not require character segmentation. A uniform text block on which texture analysis can be performed is produced from a document image via simple processing. Multiple channel (Gabor} filters and grey level co-occurrence matrices are used in independent experiments in order to extract texture features. Classification of test documents is made based on the features of training documents using the K-NN classijler. Initial results of over 95% accuracy on the classification of 105 test documents from 7 scripts are very promising. The method shows robustness with respect to noise, the presence of foreign characters or numerals, and can be applied to very small amounts of text.
We propose a new method for calculating the skew angle of scanned document images. The method is designed to be insensitive to document layout, line spacing, font, graphicshmages and, most importantly, the language or script of the document. This is achieved by examining the Fourier spectra of blocks of the document image for peak pairs corresponding to the angle of skew. From a histogram compiled over all blocks in the document image the correct skew angle can be determined to within approximately 0.5 3 regardless of document script, even when the image contains considerable graphical information.
Many techniques have been reported for handwriting-based writer identification. Most such techniques assume that the written text is fixed (e.g., in signature verification). In this paper we attempt to eliminate this assumption by presenting a novel algorithm for automatic text-independent writer identification from non-uniformly skewed handwriting images. Given that the handwriting of different people is often visually distinctive, we take a global approach based on texture analysis, where each writers' handwriting is regarded as a different texture. In principle this allows us to apply any standard texture recognition algorithm for the task (e.g., the multi-channel Gabor filtering technique). Results of 96.0% accuracy on the classification of 150 test documents from 10 writers are very promising. The method is shown to be robust to noise and contents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.