This is an author-produced, peer-reviewed version of this article.
ABSTRACTInformation in historical datasets comes in many forms. We are working with a set of World War I era postcards that contain hand written text, some preprinted text, postage stamps and postmark/cancellation stamps. The postmarks are of considerable interest to collectors looking for images of samples they had not previously seen. The postmarks also provide information on the originating location of the card that complements the information in the address block.The postmarks vary considerably with towns and dates, but also styles. The styles can be grouped into categories. A method for automatically extracting templates for each category of these postmark stamps is described. The problem is complicated by the high levels of degradation present in the cards. The approach uses a cascade of unsupervised learning steps separated with image cleaning. This introduces averaging steps, which reduces noise. It also provides a reduction in the number of comparisons between samples. While merges happen at each stage, the number of times merges are needed within each stage is reduced. The templates once extracted can be used to group the postmarks, and will contribute information about the postmark content to better separate the postmark from the paper and other interfering marks to extract further information about the postmarks and postcards.