Several tasks related to geographical information retrieval and to the geographical information sciences involve toponym matching, i.e., the problem of matching place names that share a common referent. In this article, we present the results of a wideranging evaluation on the performance of different string similarity metrics over the toponym matching task. We also report on experiments involving the usage of supervised machine learning for combining multiple similarity metrics, which has the natural advantage of avoiding the manual tuning of similarity thresholds. Experiments with a very large dataset show that the performance differences for the individual similarity metrics are relatively small, and that carefully tuning the similarity threshold is important for achieving good results. The methods based on supervised learning, particularly when considering ensembles of decision trees, can achieve good results on this task, significantly outperforming the individual similarity metrics.
(1)Archaeological, historical, and ethnographic research has demonstrated how mountainous environments infl uence the socio-cultural dynamics of the communities that live in them and in their neighbouring areas. The development of these communities tends to occur at the margins, often far away from centres of political power. This marginality is also extended to movement in these regions, where mountain ranges regularly constitute mighty obstacles on account of their natural confi guration which plays a central role in strategy, commerce and travelling. In the case of western Sierra Morena in Spain, its constitution shaped both the ways of transit through the mountains during Later Prehistory and the historical routes of communication that traverse Andalucía. Using a GIS methodology developed specifi cally to identify particular characteristics of the landscape relevant to human movement, such as passageways, crossing points, and natural areas of transit, we examine the role that natural accessibility had for the late prehistoric societies of this region. We conclude that the location of their habitats and symbolic places are strongly related to corridors, possibly due to an increasing importance of herding activities.
The field of Spatial Humanities has advanced substantially in the past years. The identification and extraction of toponyms and spatial information mentioned in historical text collections has allowed its use in innovative ways, making possible the application of spatial analysis and the mapping of these places with geographic information systems. For instance, automated place name identification is possible with Named Entity Recognition (NER) systems. Statistical NER methods based on supervised learning, in particular, are highly successful with modern datasets. However, there are still major challenges to address when dealing with historical corpora. These challenges include language changes over time, spelling variations, transliterations, OCR errors, and sources written in multiple languages among others. In this article, considering a task of place name recognition over two collections of historical correspondence, we report an evaluation of five NER systems and an approach that combines these through a voting system. We found that although individual performance of each NER system was corpus dependent, the ensemble combination was able to achieve consistent measures of precision and recall, outperforming the individual NER systems. In addition, the results showed that these NER systems are not strongly dependent on preprocessing and translation to Modern English.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.