We demonstrate the schema and ontology matching tool COMA++. It extends our previous prototype COMA utilizing a composite approach to combine different match algorithms [3]. COMA++ implements significant improvements and offers a comprehensive infrastructure to solve large real-world match problems. It comes with a graphical interface enabling a variety of user interactions. Using a generic data representation, COMA++ uniformly supports schemas and ontologies, e.g. the powerful standard languages W3C XML Schema and OWL. COMA++ includes new approaches for ontology matching, in particular the utilization of shared taxonomies. Furthermore, different match strategies can be applied including various forms of reusing previously determined match results and a so-called fragmentbased match approach which decomposes a large match problem into smaller problems. Finally, COMA++ cannot only be used to solve match problems but also to comparatively evaluate the effectiveness of different match algorithms and strategies.
Current schema matching approaches still have to improve for very large and complex schemas. Such schemas are increasingly written in the standard language W3C XML schema, especially in E-business applications. The high expressive power and versatility of this schema language, in particular its type system and support for distributed schemas and namespaces, introduce new issues. In this paper, we study some of the important problems in matching such large XML schemas. We propose a fragment-oriented match approach to decompose a large match problem into several smaller ones and to reuse previous match results at the level of schema fragments.
Abstract. Schema and ontology matching have attracted a great deal of interest among researchers. Despite the advances achieved, the large matching problem still presents a real challenge, such as it is a timeconsuming and memory-intensive process. We therefore propose a scalable, clustering-based matching approach that breaks up the large matching problem into smaller matching problems. In particular, we first introduce a structure-based clustering approach to partition each schema graph into a set of disjoint subgraphs (clusters). Then, we propose a new measure that efficiently determines similar clusters between every two sets of clusters to obtain a set of small matching tasks. Finally, we adopt the matching prototype COMA++ to solve individual matching tasks and combine their results. The experimental analysis reveals that the proposed method permits encouraging and significant improvements.
The XML Wrapper is a new feature of the federated database capabilities of DB2/UDB v8. It enables users and applications to issue SQL queries against XML data from a variety of sources, including files and web services. The XML Wrapper assumes hierarchical XML documents modeled as families of virtual relational tables in a federated schema, which can then be queried to extract information from the XML and combine it with data from other sources.Due to the nature of the problem, using the XML Wrapper is complex and several difficult steps must be undertaken: (i) The hierarchical schema of the source must be flattened to a relational form. (ii) Each relation of the flattened schema must be registered in DB2 as a NICKNAMEa complex virtual table definition containing several XPaths as specialized options. (iii) Each NICKNAME must be accompanied by a VIEW -again a complex structure involving join conditions. Chocolate is a tool that alleviates all three tasks: Chocolate provides several flattening strategies and an interface allowing users to modify the automatically generated target schema. Once the user is satisfied with the schema, Chocolate automatically generates the corresponding NICKNAME and VIEW definitions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.