Realization of high accuracies and efficiencies in South Indian character recognition systems is one of the principle goals to be attempted time after time so as to promote the usage of optical character recognition (OCR) for South Indian languages like Telugu. The process of character recognition comprises pre-processing, segmentation, feature extraction, classification and recognition. The feature extraction stage is meant for uniquely recognizing each character image for the purpose of classifying it. The selection of a feature extraction algorithm is very critical and important for any image processing application and mostly of the times it is directly proportional to the type of the image objects that we have to identify. For optical technologies like South Indian OCR, the feature extraction technique plays a very vital role in accuracy of recognition due to the huge character sets. In this work we mainly focus on evaluating the performance of various feature extraction techniques with respect to Telugu character recognition systems and analyze its efficiencies and accuracies in recognition of Telugu character set.
Realization of high accuracies towards south Indian character recognition is one the truly interesting research challenge. In this paper, our investigation is focused on recognition of one of the most widely used south Indian script called Kannada. In particular, the proposed exper-iment is subject towards the recognition of degraded character images which are extracted from the ancient Kannada poetry documents and also on the handwritten character images that are collected from various unconstrained environments. The character images in the degraded documents are slightly blurry as a result of which character image is imposed by a kind of broken and messy appearances, this particular aspect leads to various conflicting behaviors of the recognition algorithm which in turn reduces the accuracy of recognition. The training of degraded patterns of character image samples are carried out by using one of the deep convolution neural networks known as Alex net.The performance evaluation of this experimentation is subject towards the handwritten datasets gathered synthetically from users of age groups between 18-21, 22-25 and 26-30 and also printed datasets which are extracted from ancient document images of Kannada poetry/literature. The datasets are comprised of around 497 classes. 428 classes include consonants, vowels, simple compound characters and complex com-pound characters. Each base character combined with consonant/vowel modifiers in handwritten text with overlapping/touching diacritics are assumed as a separate class in Kannada script for our experimentation. However, for those compound characters that are non-overlapping/touching are still considered as individual classes for which the semantic analysis is carried out during the post processing stage of OCR. It is observed that the performance of the Alex net in classification of printed character samples is reported as 91.3% and with reference to handwritten text, and accuracy of 92% is recorded.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.