-In this paper we present an unsupervised segmentation-free method for spotting and searching query, especially, for images documents in handwritten Arabic, for this, Histograms of Oriented Gradients (HOGs) are used as the feature vectors to represent the query and documents image. Then, we compress the descriptors with the product quantization method. Finally, a better representation of the query is obtained by using the Support Vector Machines (SVM).
The old manuscripts are a part of the richest cultural heritage and legacy of civilizations where the digitalization is a solution for the preservation of these manuscripts. The conception of handwriting recognition system knows today a great expansion and appears as a necessity in order to exploit the wealth of information contained in ancient manuscripts. In this paper, a holistic approach for spotting and searching query, especially, for images documents in handwritten Arabic is proposed. These operations need a lot of time and effort to do manual work. For this, we use in the first time text line segmentation of handwritten document image based on partial projection, where a sliding-window approach is used to locate the document regions that are most similar to the query. Histograms of Oriented Gradients (HOGs) are used as the feature vectors to represent the query and documents image, then Support Vector Machines (SVM) is used to produce a better representation of the query and to classify feature vectors. Finally, the application of the reclassification technique at the indexation stage, leads to better results.
General TermsPattern Recognition.
The similarity or the distance measure have been used widely to calculate the similarity or dissimilarity between vector sequences, where the document images similarity is known as the domain that dealing with image information and both similarity/distance has been an important role for matching and pattern recognition. There are several types of similarity measure, we cover in this paper the survey of various distance measures used in the images matching and we explain the limitations associated with the existing distances. Then, we introduce the concept of the floating distance which describes the variation of the threshold’s selection for each word in decision making process, based on a combination of Linear Regression and cosine distance. Experiments are carried out on a handwritten Arabic image documents of Gallica library. These experiments show that the proposed floating distance outperforms the traditional distance in word spotting system.
This paper presents a query-by-example word spotting in handwritten Arabic documents, based on Scale Invariant Feature Transform (SIFT), without using any text word or line segmentation approach, because any errors affect to the subsequent word representation. First the interest points are automatically extracted from the images using SIFT detector, then, we use SIFT descriptor to represent each interest point in the images. In the end, we represent the image's regions as histogram of visual words. The validate study is conducted under a series of controlled experiments on handwritten Arabic documents images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.