For meaningful information exchange or integration, providers and consumers need compatible semantics between source and target systems. It is widely recognized that achieving this semantic integration is very costly. Nearly all the published research concerns how system integrators can discover and exploit semantic knowledge in order to better share data among the systems they already have. This research is very important, but to make the greatest impact, we must go beyond after-the-fact semantic integration among existing systems, to actively guiding semantic choices in new ontologies and systems - e.g., what concepts should be used as descriptive vocabularies for existing data, or as definitions for newly built systems. The goal is to ease data sharing for both new and old systems, to ensure that needed data is actually collected, and to maximize over time the business value of an enterprise's information systems.
Data integration systems often provide a uniform query interface, called a mediated schema, to a multitude of data sources. To answer user queries, such systems employ a set of semantic matches between the mediated schema and the data-source schemas. Finding such matches is well known to be difficult. Hence much work has focused on developing semi-automatic techniques to efficiently find the matches. In this paper we consider the complementary problem of improving the mediated schema, to make finding such matches easier. Specifically, a mediated schema S will typically be matched with many source schemas. Thus, can the developer of S analyze and revise S in a way that preserves S's semantics, and yet makes it easier to match with in the future?In this paper we provide an affirmative answer to the above question, and outline a promising solution direction, called mSeer. Given a mediated schema S and a matching tool M , mSeer first computes a matchability score that quantifies how well S can be matched against using M . Next, mSeer uses this score to generate a matchability report that identifies the problems in matching S. Finally, mSeer addresses these problems by automatically suggesting changes to S (e.g., renaming an attribute, reformatting data values, etc.) that it believes will preserve the semantics of S and yet make it more amenable to matching. We present extensive experiments over several real-world domains that demonstrate the promise of the proposed approach.
No abstract
Data collaborations allow users to draw upon diverse resources to solve complex problems. While collaborations enable a greater ability to manipulate data and services, they also create new security vulnerabilities. Collaboration participants need methods to detect suspicious behaviors (potentially caused by malicious insiders) and assess trust in information when it passes through many hands. In this work, we describe these challenges and introduce provenance as a way to solve them. We describe a provenance system, PLUS, and show how it can be used to assist in assessing trust and detecting suspicious behaviors. A preliminary study shows this to be a promising direction for future research. Index terms-provenance, trust, insider threat, lineage, pedigree I. 1 This example is provided by www.mulesoft.com. 2 Provenance is-information that helps determine the derivation history of a data product...[It includes] the ancestral data product(s) from which this data product evolved, and the process of transformation of these ancestral data product(s)‖ [22]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.