Abstract-In this paper, we focus on efficient keyword query processing for XML data based on SLCA and ELCA semantics. We propose for each keyword a novel form of inverted list, which includes IDs of nodes that directly or indirectly contain the keyword. We propose a family of efficient algorithms that are based on the set intersection operation for both semantics. We show that the problem of SLCA/ELCA computation becomes finding a set of nodes that appear in all involved inverted lists and satisfy certain conditions. We also propose several optimization techniques to further improve the query processing performance. We have conducted extensive experiments with many alternative methods. The results demonstrate that our proposed methods outperform existing ones by up to two orders of magnitude in many cases.
Given a directed acyclic graph (DAG), a k-hop reachability query u ?k − → v is used to answer whether there exists a path from u to v with length ≤ k. Answering k-hop reachability queries is a fundamental graph operation and has been extensively studied during the past years. Considering that existing approaches still suffer from inefficiency in practice when processing large graphs, we propose a novel labeling scheme, namely HT, to accelerate k-hop reachability queries answering. HT uses a constrained 2hop distance label to maintain the length of shortest paths between a set of hop nodes and other nodes, and for the remaining reachability information, HT uses a novel topological level to accelerate graph traversal. Further, we propose to enhance HT by two optimization techniques. The experimental results show that compared with the state-of-the-art approaches, HT works best for most graphs when answering k-hop reachability queries with small index size and reasonable index construction time.INDEX TERMS Graph data management, reachability queries processing, k-hop reachability.
In this paper, we focus on efficient construction of tightest matched subtree (TMSubtree) results, for keyword queries on extensible markup language (XML) data, based on smallest lowest common ancestor (SLCA) semantics. Here, "matched" means that all nodes in a returned subtree satisfy the constraint that the set of distinct keywords of the subtree rooted at each node is not subsumed by that of any of its sibling nodes, while "tightest" means that no two subtrees rooted at two sibling nodes can contain the same set of keywords. Assume that d is the depth of a given TMSubtree, m is the number of keywords of a given query Q. We proved that if d ≤ m, a matched subtree result has at most 2m! nodes; otherwise, the size of a matched subtree result is bounded by (d -m + 2)m!. Based on this theoretical result, we propose a pipelined algorithm to construct TMSubtree results without rescanning all node labels. Experiments verify the benefits of our algorithm in aiding keyword search over XML data.
Category: Smart and intelligent computing
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.