We improved access to the collection of Dutch historical newspapers of the Koninklijke Bibliotheek by linking named entities in the newspaper articles to corresponding Wikidata descriptions by means of machine learning techniques and crowdsourcing. Indexing the Wikidata identifiers for named entities together with the newspaper articles opens up new possibilities for retrieving articles that mention these resources and searching the newspaper collection using semantic relations from Wikidata. In this paper we describe our steps so far in setting up this combination of entity linking, machine learning and crowdsourcing in our research environment as well as our planned activities aimed at improving the quality of the links and extending the semantic search capabilities.
0.2 The model has been designed without regard to any specific implementation. UKOLN proposes to translate the model into a schema, and from there to construct a demonstrator implementation, in subsequent phases of the project. The need to reflect the complexity which underlies collection description has led to a multidimensional model, and some of the possible vehicles for implementation may not support such a structure fully. Schemes such as RDF and XML may provide a richer implementation, with secondary mappings to simpler standards such as HTML; but it is inevitable that some aspects of the structure will be lost in such mappings. It is hoped that the model is comprehensive enough to clarify the differences between those aspects of any implementation which truly reflect the reality of collection description and those which merely derive from the structure of the implementation mechanism.Collection Description v.3-1Michael Heaney 3 0.3 Collection description is such a broad descriptive term that it is worth saying something about the intended scope of the model. Although it has its origin in the RSLP programme, many of whose results will be digital resources of one kind or another, the model is not restricted to the description of digital collections. It is intended that the model should be applicable to physical and digital collections of all kinds, including library, art and museum materials, and is by no means applicable only to the resources of large research libraries. Collection description itself may take a variety of forms, and the model makes no presumption about the format of such a description.0.4 The model is aimed in the first instance at those responsible for the development of collection descriptions. It is also a general contribution to the debate about metadata in the digital age. As described above, its initial use will be to inform the construction of a demonstrator to which all relevant RSLP projects can feed information. With the model as its base, the demonstrator is intended to be appropriate for and hospitable to their requirements for collection description. In terms to be developed below (see section 5.5 and 6), the demonstrator will accommodate Unitary Finding-Aids for the collections.0.5 Although the primary purpose of this model is to illumine the process of resource discovery by users, collection description also serves collection management purposes, particularly in discharging an institution's curatorial responsibilities. The information landscape1.1 The information landscape can be seen as a contour map in which there are mountains, hillocks, valleys, plains and plateaux. A large general collection of information -say a research library -can be seen as a plateau, raised above the surrounding plain. A specialized collection of particular importance is like a sharp peak. Upon a plateau there might be undulations representing strengths and weaknesses.1.2 The scholar surveying this landscape is looking for the high points. A high point represents an area where the potential for gl...
Library catalogues may be connected to the linked data cloud through various types of thesauri. For name authority thesauri in particular I would like to suggest a fundamental break with the current distributed linked data paradigm: to make a transition from a multitude of different identifiers to using a single, universal identifier for all relevant named entities, in the form of the Wikidata identifier. Wikidata (https://wikidata.org) seems to be evolving into a major authority hub that is lowering barriers to access the web of data for everyone. Using the Wikidata identifier of notable entities as a common identifier for connecting resources has significant benefits compared to traversing the ever-growing linked data cloud. When the use of Wikidata reaches a critical mass, for some institutions, Wikidata could even serve as an authority control mechanism.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.