Contrary to popular belief, Optical Character Recognition (OCR) remains a challenging problem when text occurs in unconstrained environments, like natural scenes, due to geometrical distortions, complex backgrounds, and diverse fonts. In this paper, we present a segmentation-free OCR system that combines deep learning methods, synthetic training data generation, and data augmentation techniques. We render synthetic training data using large text corpora and over 2 000 fonts. To simulate text occurring in complex natural scenes, we augment extracted samples with geometric distortions and with a proposed data augmentation technique -alpha-compositing with background textures. Our models employ a convolutional neural network encoder to extract features from text images. Inspired by the recent progress in neural machine translation and language modeling, we examine the capabilities of both recurrent and convolutional neural networks in modeling the interactions between input elements. The proposed OCR system surpasses the accuracy of leading commercial and open-source engines on distorted text samples.
Print media collections of considerable size are held by cultural heritage organizations and will soon be subject to digitization activities. However, technical content quality management in digitization workflows strongly relies on human monitoring. This heavy human intervention is cost intensive and time consuming, which makes automization mandatory. In this article, a new automatic quality assessment and improvement system is proposed. The digitized source image and color reference target are extracted from the raw digitized images by an automatic segmentation process. The target is evaluated by a reference-based algorithm. No-reference quality metrics are applied to the source image. Experimental results are provided to illustrate the performance of the proposed system. We show that it features a good performance in the extraction as well as in the quality assessment step compared to the state-of-the-art. The impact of efficient and dedicated quality assessors on the optimization step is extensively documented.
Reliable and generic methods for skew detection are a necessity for any large-scale digitization projects. As one of the first processing steps, skew detection and correction has a heavy influence on all further document analysis modules, such as geometric and logical layout analysis. This paper introduces a generic, scaleindependent algorithm capable of accurately detecting the global skew angle of document images within the range [-90 • , 90 • ]. By using the same framework, the algorithm is then extended for Roman script documents so as to cope with the full range [-180 • , 180 • ) of possible skew angles. Despite its generality, the improved algorithm is very fast and requires no explicit parameters. Experiments on a combined test set comprising around 110 000 real-life images show the accuracy and robustness of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.