With the increasing popularity of XML for data representations, there is a lot of interest in searching XML data. Due to the structural heterogeneity and textual content's diversity of XML, it is daunting for users to formulate exact queries and search accurate answers. Therefore, approximate matching is introduced to deal with the difficulty in answering users' queries, and this matching could be addressed by first relaxing the structure and content of a given query, and then looking for answers that match the relaxed queries. Ranking and returning the most relevant results of a query have become the most popular paradigm in XML query processing. However, the existing proposals do not adequately take structures into account and they therefore lack the strength to elegantly combine structures with contents to answer the relaxed queries. To address this problem, we first propose a sophisticated framework of query relaxations for supporting approximate queries over XML data. The answers underlying this framework are not compelled to strictly satisfy the given query formulation, instead they can be founded on properties inferable from the original query. We then develop a novel top-k retrieval approach which can smartly generate the most promising answers in an order correlated with the ranking measure. We complement the work with a comprehensive set of experiments to show the effectiveness of our proposed approach in terms of precision and recall metrics.
Index Terms-XML, approximate queries, query relaxations, top-k1063-6706 (c)