Efficient keyword search for smallest LCAs in XML databases

Xu, Yu; Papakonstantinou, Yannis

doi:10.1145/1066157.1066217

Cited by 379 publications

(557 citation statements)

References 25 publications

Supporting

Mentioning

548

Contrasting

Unclassified

Order By: Relevance

“…Information retrieval by keyword search on probabilistic XML has been studied by Li et al [47]. Specifically, they perform keyword search in the ProTDB model by adopting the notion of Smallest Lower Common Ancestor (SLCA) [69], which defines when an XML node constitutes an answer for a keyword-search query. More particularly, the problem they explore is that of finding the k nodes with the highest probabilities of being SLCAs in a random world.…”

Section: Top-k Queriesmentioning

confidence: 99%

Probabilistic XML: Models and Complexity

Kimelfeld

Senellart

2013

Advances in Probabilistic Databases for Uncertain Information Management

View full text Add to dashboard Cite

Abstract. Uncertainty in data naturally arises in various applications, such as data integration and Web information extraction. Probabilistic XML is one of the concepts that have been proposed to model and manage various kinds of uncertain data. In essence, a probabilistic XML document is a compact representation of a probability distribution over ordinary XML documents. Various models of probabilistic XML provide different languages, with various degrees of expressiveness, for such compact representations. Beyond representation, probabilistic XML systems are expected to support data management in a way that properly reflects the uncertainty. For instance, query evaluation entails probabilistic inference, and update operations need to properly change the entire probability space. Efficiently and effectively accomplishing data-management tasks in that manner is a major technical challenge. This chapter reviews the literature on probabilistic XML. Specifically, this chapter discusses the probabilistic XML models that have been proposed, and the complexity of query evaluation therein. Also discussed are other data-management tasks like updates and compression, as well as systemic and implementation aspects.

show abstract

Section: Top-k Queriesmentioning

confidence: 99%

Probabilistic XML: Models and Complexity

Kimelfeld

Senellart

2013

Advances in Probabilistic Databases for Uncertain Information Management

View full text Add to dashboard Cite

show abstract

“…SLCA [20] is proposed to find the smallest LCA that doesn't contain other LCA in its subtree. XSEarch [6] is a variation of LCA, which claims two nodes n 1 and n 2 are related if there is no two distinct nodes with same tag name on the paths from their LCA to n 1 and n 2 .…”

Section: Related Workmentioning

confidence: 99%

“…Therefore, it is desired that the search engine is able to find and extract the data fragments corresponding to the real world objects. semantics, which solve the problem by examining the data set to find the smallest common ancestors [16,13,9,7,20]. This method, while pioneering, has the drawback that its result may not be meaningful in many cases.…”

Section: Introductionmentioning

confidence: 99%

An Effective Object-Level XML Keyword Search

Bao

Ling

et al. 2010

Database Systems for Advanced Applications

View full text Add to dashboard Cite

Abstract. Keyword search is widely recognized as a convenient way to retrieve information from XML data. In order to precisely meet users' search concerns, we study how to effectively return the targets that users intend to search for. We model XML document as a set of interconnected object-trees, where each object contains a subtree to represent a concept in the real world. Based on this model, we propose object-level matching semantics called Interested Single Object (ISO) and Interested Related Object (IRO) to capture single object and multiple objects as user's search targets respectively, and design a novel relevance oriented ranking framework for the matching results. We propose efficient algorithms to compute and rank the query results in one phase. Finally, comprehensive experiments show the efficiency and effectiveness of our approach, and an online demo of our system on DBLP data is available at http://xmldb.ddns.comp.nus.edu.sg.

show abstract

“…XML search engines employ the ranked retrieval paradigm for producing relevance-ordered result lists rather than merely using XPath or XQuery for Boolean retrieval. An important subset of XML search engines uses keywordbased queries [2,8,31], which is especially important for collections of documents with unknown or highly heterogeneous schemas. However, simple keyword queries cannot exploit the often rich annotations available in XML, so the results of an initial query are often not very satisfying.…”

Section: Motivationmentioning

confidence: 99%

Structural Feedback for Keyword-Based XML Retrieval

Schenkel

Theobald

2006

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Keyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a user to enhance retrieval quality. For keyword-based XML queries, feedback engines usually generate an expanded keyword query from the content of elements marked as relevant or nonrelevant. This approach that is inspired by text-based IR completely ignores the semistructured nature of XML. This paper makes the important step from pure content-based to structural feedback. It presents a framework that expands a keyword query into a full-fledged content-and-structure query. Extensive experiments with the established INEX benchmark and our TopX search engine show the feasibility of our approach.

show abstract

Efficient keyword search for smallest LCAs in XML databases

Cited by 379 publications

References 25 publications

Probabilistic XML: Models and Complexity

Probabilistic XML: Models and Complexity

An Effective Object-Level XML Keyword Search

Structural Feedback for Keyword-Based XML Retrieval

Contact Info

Product

Resources

About