Abstract-In this work, we proposed indexing model for script identification. Rectangular White Space analysis algorithm is used to analyze and identify heterogeneous layouts of document images. To speed up the script identification, we focus on designing an indexing mechanism for tri-lingual scripts for optimizing the subsequent robust identification system. For representation, we extract features from Gabor responses and also using scale invariant feature transform. We considered a set of global features and index by Kd-tree. For experimentation, we have used our own database. Experimental results reveal that indexing prior to identification is faster than conventional identification method in terms of time for scripts.
In this work, we review the outcome of texture features for script classification. Rectangular White Space analysis algorithm is used to analyze and identify heterogeneous layouts of document images. The texture features, namely the color texture moments, Local binary pattern (LBP) and responses of Gabor, LM-filter, S-filter, R-filter are extracted, and combinations of these are considered in the classification. In this work, a probabilistic neural network and Nearest Neighbor are used for classification. To corrabate the adequacy of the proposed strategy, an experiment was operated on our own data set. To study the effect of classification accuracy, we vary the database sizes and the results show that the combination of multiple features vastly improves the performance.
Abstract-Physical layout analysis intends to study the arrangement of layouts or locations of the regions present in a document image before understanding it. Before extracting the text or information from a document image, page segmentation (layout analysis) techniques need to be applied to identify the exact layout (area) where the text or image resides. In Page Segmentation, Top-down methods are simple and efficient but fail in non Manhattan layouts. In contrast, Bottom-up approaches adapt non Manhattan layouts easily than the top down approaches, but heavily depend on the threshold, parameters and extensive computations for layout identification. On the other hand, Hybrid methods (Bruel [31], Bruel [32]) suits well for layout identification by eliminating the dependency on threshold and parameters. But this analyzes the white background of the image with small white rectangles and merges them to locate the content blocks. Merging of small white rectangles makes the identification process tedious since large number of small white rectangles gets involved in the image. In addition, this approach heavily relies on heuristics for merging operations, which affects the segmentation rate considerably. In all the above reported methods (Bottom up and Hybrid approaches), connected component analysis (requires more number of pixel visits) is required to identify black and white components from the image. Therefore, the above shortcomings motivated this research towards designing a White Space analysis technique which eliminates the usage of the connected component analysis (to identify white spaces), heuristics, threshold and prior knowledge. As a result, in this thesis, Rectangular White Space Analysis (RWSA) technique has been proposed to grab all the white spaces over the image in a single scan over the image with minimum pixel visits, and the white spaces are merged together without the assumptions of heuristics and threshold to segment the layouts. Moreover, two statistical properties have also been proposed in this thesis, to separate the text blocks and images from the identified layouts and this hybrid approach has been explained in the subsequent section.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.