Information extraction from historical maps represents a persistent challenge due to inferior graphical quality and the large data volume of digital map archives, which can hold thousands of digitized map sheets. Traditional map processing techniques typically rely on manually collected templates of the symbol of interest, and thus are not suitable for large-scale information extraction. In order to digitally preserve such large amounts of valuable retrospective geographic information, high levels of automation are required. Herein, we propose an automated machine-learning based framework to extract human settlement symbols, such as buildings and urban areas from historical topographic maps in the absence of training data, employing contemporary geospatial data as ancillary data to guide the collection of training samples. These samples are then used to train a convolutional neural network for semantic image segmentation, allowing for the extraction of human settlement patterns in an analysis-ready geospatial vector data format. We test our method on United States Geological Survey historical topographic maps published between 1893 and 1954. The results are promising, indicating high degrees of completeness in the extracted settlement features (i.e., recall of up to 0.96, F-measure of up to 0.79) and will guide the next steps to provide a fully automated operational approach for large-scale geographic feature extraction from a variety of historical map series. Moreover, the proposed framework provides a robust approach for the recognition of objects which are small in size, generalizable to many kinds of visual documents.INDEX TERMS Convolutional neural networks, digital humanities, digital preservation, document analysis, geospatial analysis, geospatial artificial intelligence, human settlement patterns, image analysis, weakly supervised learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.