Optical scan voting is considered by many to be the most trustworthy option for conducting elections because it provides an independently verifiable record of each voter's intent. While op-scan technology has been in use for decades, attempts to improve the machine-reading of ballots raise a range of interesting issues in document image analysis. Work thus far has been hindered by a lack of real-world data, however, since ballots associated with actual elections are kept secure from the public and normally destroyed after a period time. Fortunately, as a result of a recent challenged federal election in Minnesota, a large number of op-scan ballot images were made available for public inspection on the Web.In this paper, we present the Minnesota op-scan ballot collection as a unique resource to the document analysis community. We discuss important considerations regarding the definitions of a legal vote and a valid ballot which cannot be ignored for the purposes of technical expediency. Our efforts to annotate the collection are also described, including the development of a graphical tool for collecting ground-truth interpretations and the protocol now being employed. The collection, consisting of ballot images, file formats, and associated truth data for part of the set, is being made openly available to facilitate research in this important area.
Abstract. Document images are degraded through bilevel processes such as scanning, printing, and photocopying. The resulting image degradations can be categorized based either on observable degradation features or on degradation model parameters. The degradation features can be related mathematically to model parameters. In this paper we statistically compare pairs of populations of degraded character images created with different model parameters. The changes in the probability that the characters are from different populations when the model parameters vary correlate with the relationship between observable degradation features and the model parameters. The paper also shows which features have the largest impact on the image.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.