In this paper, as a part of character segmentation algorithm, an automatic optimized document skew correction approach based on Hough transform is presented. The importance of skew correction in document image analysis lies in the fact that further processing is impossible if the document image is skewed. The proposed approach is based on fast implementation of the standard Hough transform which is followed by highly optimized low-level machine code implementation of the image rotation. In order to achieve high computational results, linear image representation is used. The proposed approach results from the aspect of time complexity and skew estimation accuracy which are analyzed and compared with the already existing skew correction approaches. The proposed approach gives better results compared with analogous approach used in related work, but it gives worse results compared with optimized version which exploits a BAG algorithm. Provided results show significant improvement of the standard Hough transform implementation.
This paper presents an efficient new image compression and decompression methods for document images, intended for usage in the pre-processing stage of an OCR system designed for needs of the "Nikola Tesla Museum" in Belgrade. Proposed image compression methods exploit the Run-Length Encoding (RLE) algorithm and an algorithm based on document character contour extraction, while an iterative scanline fill algorithm is used for image decompression. Image compression and decompression methods are compared with JBIG2 and JPEG2000 image compression standards. Segmentation accuracy results for ground-truth documents are obtained in order to evaluate the proposed methods. Results show that the proposed methods outperform JBIG2 compression regarding the time complexity, providing up to 25 times lower processing time at the expense of worse compression ratio results, as well as JPEG2000 image compression standard, providing up to 4-fold improvement in compression ratio. Finally, time complexity results show that the presented methods are sufficiently fast for a real time character segmentation system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.