Abstract. XML is a widespread W3C standard used by several kinds of applications for data representation and exchange over the web. In the context of a system that provides semantic integration of heterogeneous XML sources, the same information at a semantic level may have different representations in XML. However, the syntax of an XML query depends on the structure of the specific XML source. Therefore, in order to obtain the same query result, one must write a specific query for each XML source. To deal with such problem, a much better solution is to state queries against a global conceptual schema and then translate them into an XML query against each specific data source. This paper presents CXPath (Conceptual XPath), a language for querying XML sources at the conceptual level, as well as a translation mechanism that converts a CXPath query to an XPath query against a specific XML source.
Abstract.In vague queries, a user enters a value that represents some real world object and expects as the result the set of database values that represent this real world object even with not exact matching. The problem appears in databases that collect data from different sources or databases were different users enter data directly. Query engines usually rely on the use of some type of similarity metric to support data with inexact matching. The problem of building query engines to execute vague queries has been already studied, but an important problem still remains open, namely that of defining the threshold to be used when a similarity scan is performed over a database column. From the bibliography it is known that the threshold depends on the similarity metrics and also on the set of values being queried. Thus, it is unrealistic to expect that the user supplies a threshold at query time. In this paper we propose a process for estimation of recall/precision values for several thresholds for a database column. The idea is that this process is started by a database administrator in a pre-processing phase using samples extracted from database. The meta-data collected by this process may be used in query processing in the optimization phase. The paper describes this process as well as experiments that were performed in order to evaluate it.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.