Nowadays, organizations cannot satisfy their information needs from one data source. Moreover, multiple data sources across the organization fuels the need for data integration. Data integration system's users pose queries in terms of an integrated schema and expect accurate, unambiguous, and complete answers. So the data integration system is not limited to, getting the answers to the queries from the sources, but also it is extended to detect and resolve the data quality problems appeared due to the integration process. The most crucial component in any data integration system is the mappings constructed between the data sources and the integrated schema. In this paper a new mapping approach is proposed to map not only the elements of the integrated schema as done by the existing approaches, but also it maps other elements required in detecting and resolving the duplicates. It provides a means to facilitate future extensibility and changes to both the sources and the integrated schema. The proposed approach provides a linkage between the fundamental components required to provide accurate and unambiguous answers to the users' queries from the integration system.
Data fusion in the virtual data integration environment starts after detecting and clustering duplicated records from the different integrated data sources.It refers to the process of selecting or fusing at tribute values from the clustered duplicates into a singlerecord representing the real world object. In this paper, astatistical technique for data fusion is introduced based on some probabilistic scores from both data sources and clustered duplicate
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.