Unstructured metadata fields such as 'description' offer tremendous value for users to understand cultural heritage objects. However, this type of narrative information is of little direct use within a machine-readable context due to its unstructured nature. This paper explores the possibilities and limitations of Named-Entity Recognition (NER) and Term Extraction (TE) to mine such unstructured metadata for meaningful concepts. These concepts can be used to leverage otherwise limited searching and browsing operations, but they can also play an important role to foster Digital Humanities research. In order to catalyze experimentation with NER and TE, the paper proposes an evaluation of the performance of three third-party entity extraction services through a comprehensive case study, based on the descriptive fields of the Smithsonian Cooper-Hewitt National Design Museum in New York. In order to cover both NER and TE, we first offer a quantitative analysis of named-entities retrieved by the services in terms of precision and recall compared to a manually annotated gold-standard corpus, then complement this approach with a more qualitative assessment of relevant terms extracted. Based on the outcomes of this double analysis, the conclusions present the added value of entity extraction services, but also indicate the dangers of uncritically using NER and/or TE, and by extension Linked Data principles, within the Digital Humanities. All metadata and tools used within the paper are freely available, making it possible for researchers and practitioners to repeat the methodology. By doing so, the paper offers a significant contribution towards understanding the value of entity recognition and disambiguation for the Digital Humanities.
This article challenges common assumptions and opinions regarding the use of the social web by cultural heritage institutions by framing the phenomenon of user-generated metadata within the larger context of the commodification and the engagement process of our cultural heritage. Theoretical reflections on both the negative and positive long-term outcomes of the social web for libraries, archives, and museums are presented and confronted with empirical observations regarding the use of social tagging and user comments. This combination of a theoretical and an empirical approach will provide original insights into the long-term implications of user-generated metadata for cultural heritage institutions.
PurposeThis paper seeks to present a conceptual framework to analyze and improve the quality of empirical databases throughout time – with operational results which are measurable in terms of cost‐benefit.Design/methodology/approachBasing themselves on the general approach of hermeneutics and, more specifically, on Fernand Braudel's concept of “temporalités étagées” and Norbert Elias's “evolutive continuum”, the authors develop a temporal framework consisting of three stratified time levels in order to interpret shifts in the quality of databases. The soundness of the framework and its capability of delivering operational results are demonstrated by the development of a case study focusing on social security databases. A second case study in the context of digital cultural heritage is also developed to illustrate the general applicability of this interdisciplinary approach in the context of empirical information systems.FindingsContrary to the assertions of common theories that postulate a permanent bijective relationship between records in a database and the corresponding reality, this paper provides insights which demonstrate that a database evolves over time along with the interpretation of the values that it allows one to determine. These interdisciplinary insights, when applied practically to concrete case studies, give rise to original operational results in the ICT field of data quality.Practical implicationsThe framework helps both the managers and the users of empirical databases to understand the necessity to integrate unforeseen observations, neglected a priori by virtue of the closed world assumption, and to develop operational recommendations to enhance the quality of databases.Originality/valueThis paper is the first to show the potential of hermeneutics for the task of understanding the evolution of an empirical information system, and also the first to deliver operational outcomes.
The concept of Linked Data has made its entrance in the cultural heritage sector due to its potential use for the integration of heterogeneous collections and deriving additional value out of existing metadata. However, practitioners and researchers alike need a better understanding of what outcome they can reasonably expect of the reconciliation process between their local metadata and established controlled vocabularies which are already a part of the Linked Data cloud. This paper offers an in-depth analysis of how a locally developed vocabulary can be successfully reconciled with the Library of Congress Subject Headings (LCSH) and the Arts and Architecture Thesaurus (AAT) through the help of a general-purpose tool for interactive data transformation (Google Refine). Issues negatively affecting the reconciliation process are identified and solutions are proposed in order to get a maximum value from existing metadata and controlled vocabularies in an automated manner.
Purpose Advanced usage of Web Analytics tools allows to capture the content of user queries. Despite their relevant nature, the manual analysis of large volumes of user queries is problematic. This paper demonstrates the potential of using information extraction techniques and Linked Data to gather a better understanding of the nature of user queries in an automated manner.Design/methodology/approach The paper presents a large-scale case-study conducted at the Royal Library of Belgium consisting of a data set of 83 854 queries resulting from 29 812 visits over a 12 month period of the historical newspapers platform BelgicaPress. By making use of information extraction methods, knowledge bases and various authority files, this paper presents the possibilities and limits to identify what percentage of end users are looking for person and place names.Findings Based on a quantitative assessment, our method can successfully identify the majority of person and place names from user queries. Due to the specific character of user queries and the nature of the knowledge bases used, a limited amount of queries remained too ambiguous to be treated in an automated manner.Originality/value This paper demonstrates in an empirical manner both the possibilities and limits of gaining more insights from user queries extracted from a Web Analytics tool and analysed with the help of information extraction tools and knowledge bases. Methods and tools used are generalisable and can be reused by other collection holders.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.