Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data 2005
DOI: 10.1145/1066157.1066217
|View full text |Cite
|
Sign up to set email alerts
|

Efficient keyword search for smallest LCAs in XML databases

Abstract: Keyword search is a proven, user-friendly way to query HTML documents in the World Wide Web. We propose keyword search in XML documents, modeled as labeled trees, and describe corresponding efficient algorithms. The proposed keyword search returns the set of smallest trees containing all keywords, where a tree is designated as "smallest" if it contains no tree that also contains all keywords. Our core contribution, the Indexed Lookup Eager algorithm, exploits key properties of smallest trees in order to outper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
548
1
7

Year Published

2006
2006
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 379 publications
(557 citation statements)
references
References 25 publications
1
548
1
7
Order By: Relevance
“…Information retrieval by keyword search on probabilistic XML has been studied by Li et al [47]. Specifically, they perform keyword search in the ProTDB model by adopting the notion of Smallest Lower Common Ancestor (SLCA) [69], which defines when an XML node constitutes an answer for a keyword-search query. More particularly, the problem they explore is that of finding the k nodes with the highest probabilities of being SLCAs in a random world.…”
Section: Top-k Queriesmentioning
confidence: 99%
“…Information retrieval by keyword search on probabilistic XML has been studied by Li et al [47]. Specifically, they perform keyword search in the ProTDB model by adopting the notion of Smallest Lower Common Ancestor (SLCA) [69], which defines when an XML node constitutes an answer for a keyword-search query. More particularly, the problem they explore is that of finding the k nodes with the highest probabilities of being SLCAs in a random world.…”
Section: Top-k Queriesmentioning
confidence: 99%
“…SLCA [20] is proposed to find the smallest LCA that doesn't contain other LCA in its subtree. XSEarch [6] is a variation of LCA, which claims two nodes n 1 and n 2 are related if there is no two distinct nodes with same tag name on the paths from their LCA to n 1 and n 2 .…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, it is desired that the search engine is able to find and extract the data fragments corresponding to the real world objects. semantics, which solve the problem by examining the data set to find the smallest common ancestors [16,13,9,7,20]. This method, while pioneering, has the drawback that its result may not be meaningful in many cases.…”
Section: Introductionmentioning
confidence: 99%
“…XML search engines employ the ranked retrieval paradigm for producing relevance-ordered result lists rather than merely using XPath or XQuery for Boolean retrieval. An important subset of XML search engines uses keywordbased queries [2,8,31], which is especially important for collections of documents with unknown or highly heterogeneous schemas. However, simple keyword queries cannot exploit the often rich annotations available in XML, so the results of an initial query are often not very satisfying.…”
Section: Motivationmentioning
confidence: 99%