Abstract. rdf dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually -but rarely-applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the rdf dataset will be generated, is not identified. We suggest an incremental, iterative and uniform validation workflow for rdf datasets stemming originally from (semi-)structured data (e.g., csv, xml, json). In this work, we focus on assessing and improving their mappings. We incorporate (i) a test-driven approach for assessing the mappings instead of the rdf dataset itself, as mappings reflect how the dataset will be formed when generated; and (ii) perform semi-automatic mapping refinements based on the results of the quality assessment. The proposed workflow is applied to diverse cases, e.g., large, crowdsourced datasets such as dbpedia, or newly generated, such as iLastic. Our evaluation indicates the efficiency of our workflow, as it significantly improves the overall quality of an rdf dataset in the observed cases.
Linked Data is often generated based on a set of declarative rules using languages such as R2RML and RML. These languages are built with machine-processability in mind. It is thus not always straightforward for users to define or understand rules written in these languages, preventing them from applying the desired annotations to the data sources. In the past, graphical tools were proposed. However, next to users who prefer a graphical approach, there are users who desire to understand and define rules via a text-based approach. For the latter, we introduce an enhancement to their workflow. Instead of requiring users to manually write machine-processable rules, we propose writing human-friendly rules, and generate machine-processable rules based on those human-friendly rules. At the basis is YARRRML: a human-readable text-based representation for declarative generation rules. We propose a novel browser-based integrated development environment (IDE) called Matey, showcasing the enhanced workflow. In this work, we describe our demo. Users can experience first hand how to generate triples from data in different formats by using YARRRML's representation of the rules. The actual machine-processable rules remain completely hidden when editing. Matey shows that writing human-friendly rules enhances the workflow for a broader range of users. As a result, more desired annotations will be added to the data sources which leads to more desired Linked Data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.