Self-indexing inverted files for fast text retrieval

Moffat, Alistair; Zobel, Justin

doi:10.1145/237496.237497

Cited by 294 publications

(246 citation statements)

References 25 publications

Supporting

Mentioning

241

Contrasting

Unclassified

Order By: Relevance

“…In the average case, the complexity of an intersection is reduced to sub-linear time with the use of self-indexing [59] over the entity identifiers in order to skip and avoid unnecessary record comparisons.…”

Section: Query Processingmentioning

confidence: 99%

See 1 more Smart Citation

Searching web data: An entity retrieval and high-performance indexing model

Delbru

Campinas

Tummarello

2012

Journal of Web Semantics

View full text Add to dashboard Cite

Section: Query Processingmentioning

confidence: 99%

“…Such a join is linear with the number of results in the case of the quad table, and sub-linear in average for the node and field index with the use of the self-indexing method [59]. In contrast, Semplore has often to resort to possibly expensive external sort before merge-join operations.…”

Section: Processing Complexitymentioning

confidence: 99%

Searching web data: An entity retrieval and high-performance indexing model

Delbru

Campinas

Tummarello

2012

Journal of Web Semantics

View full text Add to dashboard Cite

“…As to the physical storage and access of the logical inverted index structure , it has been thoroughly studied in the IR field, which results in many optimized methods, such as byte-aligned index compression [13] and self-indexing [14]. Furthermore, in the proposed PosIdx method, relation objects enjoy the benefit of spatial locality for fast access, because positions of a term are usually physically stored together and continuously in modern IR engines.…”

Section: Fig 2 Posidx Index Structure Examplementioning

confidence: 99%

“…By adding additional indexing structures to the inverted index (e.g., self-indexing [14]), modern IR engines can supply a very efficient stream reader for a posting list AIS.…”

Section: Query Evaluationmentioning

confidence: 99%

“…The algorithm is called Bit Vector Intersection (BVI) and shown in Algorithm 3. Note that N = |O R | in line 1 can be directly obtained from inverted index as the document frequency of the term R without any computation at run time. Line 4 and 5 can be implemented together on the PosIdx index very efficiently using a sequential scan on the position list during merge-sort, with the help of self-indexing [14].…”

Section: Algorithm 1: Query Evaluation Algorithmmentioning

confidence: 99%

See 1 more Smart Citation

Semplore: An IR Approach to Scalable Hybrid Query of Semantic Web Data

et al. 2007

View full text Add to dashboard Cite

Abstract. As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help exploit textual information. It thus becomes very important that we can combine precise structured queries with imprecise keyword searches to have a hybrid query capability. In addition, due to the huge volume of information on the Semantic Web, the hybrid query must be processed in a very scalable way. In this paper, we define such a hybrid query capability that combines unary tree-shaped structured queries with keyword searches. We show how existing information retrieval (IR) index structures and functions can be reused to index semantic web data and its textual information, and how the hybrid query is evaluated on the index structure using IR engines in an efficient and scalable manner. We implemented this IR approach in an engine called Semplore. Comprehensive experiments on its performance show that it is a promising approach. It leads us to believe that it may be possible to evolve current web search engines to query and search the Semantic Web. Finally, we breifly describe how Semplore is used for searching Wikipedia and an IBM customer's product information.

show abstract