Recent years have seen an important increase of digitization projects in the cultural heritage domain. As a result, growing efforts have been directed towards the study of natural language processing technologies that support research in the humanities. This thesis is a contribution to the study and development of new text mining strategies that allow a better exploration of contemporary history collections from an entity-centric perspective. In particular, this thesis focuses on the challenging problems of disambiguating two specific kinds of named entities: toponyms and person names. They are approached as two clearly differentiated tasks, each of which exploiting the inherent characteristics that are associated to each kind of named entity.Finding the correct referent of a toponym is a challenging task, and this difficulty is even more pronounced in the historical domain, as it is not uncommon that places change their names over time. The method proposed in this thesis to disambiguate toponyms, GeoSem, is especially suited to work with collections of historical texts. It is a weakly-supervised model that combines the strengths of both toponym resolution and entity linking approaches by exploiting both geographic and semantic features. In order to do so, the method makes use of a knowledge base built using Wikipedia as a basis and complemented with additional knowledge from GeoNames.The method has been tested on a historical toponym resolution benchmark dataset in English and improved on the state of the art. Furthermore, five datasets of historical news in German and Dutch have been created from scratch and annotated. The method proposed in this thesis performs significantly better on them than two out-of-the-box state-of-the-art entity linking methods when only locations are considered for evaluation. This dissertation would not have been completed without the support and encouragement of many people. First and foremost, I wish to thank my supervisor, Professor Caroline Sporleder, from whom I have learned much, for her trust, dedicated support and guidance during these years. I would also like to thank Professor Ramin Yahyapour, for accepting to be part of this thesis committee and sharing with me some insightful comments to improve the structure of this dissertation. I would also like to gratefully acknowledge Professors Ulrich Heid, Dieter Hogrefe, Gerhard Lauer, and Wolfgang May, for their willingness to serve on my examination committee.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.