Abstract-In this paper, we present a segmentation based word spotting method for handwritten document images using Co-occurrence Histograms of Oriented Gradients (Co-HOG) descriptor. The drawback of Histogram of Oriented Gradients (HOG) is that HOG ignores spatial information of adjacent pixels where as the Co-HOG take into account spatial contextual information by capturing the co-occurrence of orientation pairs of neighbouring pixels. In order to construct Co-HOG descriptor for word spotting, we divide a word image into blocks and Co-HOG features are extracted from each block and finally concatenate them to form a feature descriptor. The proposed method is evaluated using precision and recall metrics through experimentation conducted on popular GW dataset and confirmed that our method outperforms for this dataset.Keywords-Word spotting, Character Recognition, George Washington, Dynamic Time Warping, Hidden Markov Models
I. INTRODUCTIONRecently Document Image Analysis is become one of dynamic research field which draws an attention of researcher due to its complexity and growing requirement for accessing the content of digitized information. Optical Character Recognition (OCR) has been explored for a few decades with massive accomplishment which facilitates to automate human procedure. OCR techniques usually recognize words by processing fonts independently and works well with machine printed fonts against clean environment. Generally, big quantity of document images are accumulated in digital libraries, and processing of these documents with the help of OCR requires high computation rate due to difficulty involved in understanding the page layout of digitized documents, irregular writing manner, dull ink, stained paper and other adverse factors. In order to overcome these problems, researchers have proposed a method called word spotting. Word spotting method is a moderately new alternative for text recognition and retrieval in digitized printed and handwritten documents.Handwritten word spotting is the pattern classification mission which consists of detecting given query word in handwritten document images. The word spotting in handwritten documents is not completely solved due to various challenges posed by handwritten documents. Hence, we focused on word spotting in handwritten documents rather than printed documents. Generally, a word spotting method consists of three main modules: pre-processing, feature extraction and feature matching. Among them, feature extraction is one of most important factors for achieving high retrieval performance, because of feature with strong discriminative information can be well classified even using with simplest classifier.The literature investigation exposes that HOG descriptor is extensively used in numerous recognition applications because of its discriminative capability compared to other existing feature descriptors. The HOG descriptor is developed by Dalal Importantly, HOG considers orientation of only isolated pixels, whereas spatial information of adjacent pixels i...