Abstract. Mapping languages allow us to define how Linked Data is generated from raw data, but only if the raw data values can be used as is to form the desired Linked Data. Since complex data transformations remain out of scope for mapping languages, these steps are often implemented as custom solutions, or with systems separate from the mapping process. The former data transformations remain case-specific, often coupled with the mapping, whereas the latter are not reusable across systems. In this paper, we propose an approach where data transformations (i) are defined declaratively and (ii) are aligned with the mapping languages. We employ an alignment of data transformations described using the Function Ontology (fno) and mapping of data to Linked Data described using the rdf Mapping Language (rml). We validate that our approach can map and transform dbpedia in a declaratively defined and aligned way. Our approach is not case-specific: data transformations are independent of their implementation and thus interoperable, while the functions are decoupled and reusable. This allows developers to improve the generation framework, whilst contributors can focus on the actual Linked Data, as there are no more dependencies, neither between the transformations and the generation framework nor their implementations.
Abstract. dbpedia ef, the generation framework behind one of the Linked Open Data cloud's central interlinking hubs, has limitations with regard to quality, coverage and sustainability of the generated dataset. dbpedia can be further improved both on schema and data level. Errors and inconsistencies can be addressed by amending (i) the dbpedia ef; (ii) the dbpedia mapping rules; or (iii) Wikipedia itself from which it extracts information. However, even though the dbpedia ef and mapping rules are continuously evolving and several changes were applied to both of them, there are no significant improvements on the dbpedia dataset since its limitations were identified. To address these shortcomings, we propose adapting a different semantic-driven approach that decouples, in a declarative manner, the extraction, transformation and mapping rules execution. In this paper, we provide details regarding the new dbpedia ef, its architecture, technical implementation and extraction results. This way, we achieve an enhanced data generation process, which can be broadly adopted, and that improves its quality, coverage and sustainability.
Abstract. dbpedia data is largely generated from extracting and parsing the wikitext from the infoboxes of Wikipedia. This generation process is handled by the dbpedia Extraction Framework (dbpedia ef). This framework currently consists of data transformations, a series of custom hard-coded steps which parse the wikitext, and schema transformations, which model the resulting rdf data. Therefore, applying changes to the resulting rdf data needs both Semantic Web expertise and development within the dbpedia ef. As such, the current dbpedia data is being shaped by a small amount of core developers. However, by describing both schema and data transformations declaratively, we shape and generate dbpedia data using solely declarations, splitting the concerns between implementation and modeling. The parsing functions development is decoupled from the dbpedia ef, and other data transformation functions can easily be integrated during dbpedia data generation. This demo showcases an interactive Web application that allows non-technical users to (re-)shape the dbpedia data and use external data transformation functions, solely by editing a mapping document via html controls.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.