In this article, we present an algorithmic system for determining the proper correspondence between place markers and their labels in historical maps. We assume that the locations of place markers (usually pictographs) and labels (pieces of text) have already been determined -- either algorithmically or by hand -- and we want to match the labels to the markers. This time-consuming step in the digitization process of historical maps is nontrivial even for humans but provides valuable metadata (e.g., when subsequently georeferencing the map). To speed up this process, we model the problem in terms of combinatorial optimization, solve that problem efficiently, and show how user interaction can be used to improve the quality of the results. We also consider a version of the model where we are given label fragments and additionally have to decide which fragments go together. We show that this problem is NP-hard. However, we give a polynomial-time algorithm for a restricted version of this fragment assignment problem. We have implemented the algorithm for the main problem and tested it on a manually extracted ground truth for eight historical maps with a combined total of more than 12,800 markers and labels. On average, the algorithm correctly matches 96% of the labels and is robust against noisy input. It furthermore performs a
sensitivity analysis
and in this way computes a measure of confidence for each of the matches. We use this as the basis for an interactive system where the user’s effort is directed to checking those parts of the map where the algorithm is unsure; any corrections the user makes are propagated by the algorithm. We discuss a prototype of this system and statistically confirm that it successfully locates those areas on the map where the algorithm needs help.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.