Evaluating statistical, transform, model, and structural-based texture features for DIR Comparative analysis of the DIR results obtained from 26 texture features Providing a computational time analysis of texture features used for DIR
In this paper we investigate the usefulness of two different texture features along with classification fusion for document image retrieval. A local binary texture method, as a statistical approach, and a wavelet analysis technique, as a transform-based approach, are used for feature extraction and two feature vectors are obtained for every document image. The similarity distances between each of the two feature vectors extracted for a given query and the feature vectors extracted from the document images in the training step are computed separately. In order to use the properties of both features, a classifier fusion technique is then employed using a weighted average fusion of distance measures obtained in relation to each feature vector. The document images are finally ranked based on the greatest visual similarity to the query obtained from the fusion similarity measures. The Media Team Document Database, which provides a great variety of page layouts and contents, is considered for evaluating the proposed method. The results obtained from the experiments demonstrate a correct document retrieval of 65.4% and 91.8% in the Top-1 and Top-10 ranked document list, respectively.
The tendency of current technology is towards a paperless world. Due to the rapid increase of digitized documents, providing a fast and easy method for retrieval is in high demand. The aim of this paper is to examine the effectiveness of texture features for document image retrieval. Thus, segmentation-free document image retrieval using a binary texture method is proposed. In the proposed approach, local features are extracted, local grey-level structures are summarised, and their distribution is characterised using global features. The assumption is that texture properties in the text regions and non-text regions of the document images are different. This assumption is used to rank the available document images and retrieve only those, which have greatest visual similarity to a given query. The under-sampled image and sub-images of the original image are further considered to improve the retrieval results, which are up to 76.0% in the first ranking and 96.2% in the Top-10 ranking. The Media Team Oulu Document Database, which is a heterogeneous database that offers a great variety of page layouts and contents, is used for experimentation.
Due to the rapid increase of different digitized documents, the development of a system to automatically retrieve document images from a large collection of structured and unstructured document images is in high demand. Many techniques have been developed to provide an efficient and effective way for retrieving and organizing these document images in the literature. This paper provides an overview of the methods which have been applied for document image retrieval over recent years. It has been found that from a textual perspective, more attention has been paid to the feature extraction methods without using OCR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.