The objective of this paper is to project a new methodology for text recognition from the features of segmented text component of images. Text classification algorithm is the main decision making stage of text recognition system. Artificial neural network approach has been used to train and test the character based on the extracted features. Finally, the identified texts are converted in to readable/editable version of text file.
KeywordsText extraction, Feature extraction,Text Recognition, Neural Network, Back Propagation.
. INTRODUCTIONThe aim of text recognition is to recognize and covert human readable text image characters to machine readable characters. Classification stage is the main decision making stage of text recognition system and uses the features extracted in the previous stage to identify the text component according to the features extracted.
. EXISTING TEXT RECOGNITION METHODSText recognition stage is the main decision making stage of text recognition system. Various classifiers techniques are proposed in the literature and are used for the recognition of text. Some of them are multi-level slice classifier, minimum distance classifier, maximum likelihood classifier, fuzzy measure, artificial neural network, support vector machines, decision tree etc.A robust method [1] that uses convolutional co-occurrence histogram of oriented gradient (ConvCoHOG) and discriminative than both the histogram of oriented gradient (HOG) and the co-occurrence histogram of oriented gradients (CoHOG).An image was first divided into smaller patches and feature extraction procedure was applied in every patch separately to extract features. The orientation of gradient of each pixel within a patch is then quantized into histogram bins and then, normalized histogram was concatenated together to form a feature vector ant it was trained by al linear SVM classifier.In end-to-end method [2] individual characters were detected as Extremal Regions. The regions were first agglomerated into text lines by an efficient pruned exhaustive search that estimates the text direction on each triplet of regions and the constraints induced by the text direction contribute to the similarity measure used for clustering. In the next stage, each region in the text line was labeled by the character recognition module, which was trained on synthetic fonts. Regions with low confidence were rejected, which eliminates clutter regions that were included in the text line formation stage. In the last step, a directed graph was constructed with corresponding scores assigned to each node and edge, the scores were normalized by width of the area that they represent and a standard dynamic programming algorithm was used to select the path with the highest score. The sequence of regions and their labels induced by the optimal path was the output of the method.Gokhan Yildirim et.al [3] proposed a technique to detect and recognize text in a unified manner by searching for words directly without reducing the image into text regions or individual charact...