Requirements for the objective evaluation of automated data-entry systems are presented. Because the cost of correcting errors dominates the document conversion process, the most important characteristic of an OCR device is accuracy. However, diflerent measures of accuracy (error metrics) are appropriate for different applications, and at the character, word, textline, text-block, and document levels. For wholly objective assessment, OCR devices must be tested under programmed, rather than interactive, control.
A new projection profile based algorithm that extracts fiducial points needed to estimate a skew angle by decoding a JBIG compressed image is presented. This algorithm and three other projection profile based algorithms were tested using 460 page images and 1,246 single column text zones extracted from the page images. Linear regression analyses of the experimental results showed that the new algorithm performed competitively with the other three algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.