In order to create better decisions for business analytics, organizations increasingly use external structured, semi-structured, and unstructured data in addition to the (mostly structured) internal data. Current Extract-Transform-Load (ETL) tools are not suitable for this “open world scenario” because they do not consider semantic issues in the integration processing. Current ETL tools neither support processing semantic data nor create a semantic Data Warehouse (DW), a repository of semantically integrated data. This paper describes our programmable Semantic ETL (SETL) framework. SETL builds on Semantic Web (SW) standards and tools and supports developers by offering a number of powerful modules, classes, and methods for (dimensional and semantic) DW constructs and tasks. Thus it supports semantic data sources in addition to traditional data sources, semantic integration, and creating or publishing a semantic (multidimensional) DW in terms of a knowledge base. A comprehensive experimental evaluation comparing SETL to a solution made with traditional tools (requiring much more hand-coding) on a concrete use case, shows that SETL provides better programmer productivity, knowledge base quality, and performance.Peer ReviewedPostprint (author's final draft
The proliferation of heterogeneous sources of ontology instances of semantic knowledge base raises a research issue of automatic matching of instances. However, automatic instance matching is heavily affected by the weight of property associated to instances. Measuring the property weight automatically is a formidable task. In this paper, we propose an efficient method of measuring weight automatically and apply the method for augmentation of our state-of-the-art instance matcher, which consider the semantic specification of properties associated to instances, for matching with heterogeneous instances of semantic knowledge base. Our experiments and evaluations shows the effectiveness of automatic weight generation in ontology instance matching over various transformations of a dataset: value transformation, logical and structural transformation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.