1996
DOI: 10.1145/237496.237497
|View full text |Cite
|
Sign up to set email alerts
|

Self-indexing inverted files for fast text retrieval

Abstract: Query processing costs on large text databases are dominated by the need to retrieve and scan the inverted list of each query term. Here we show that query response time for conjunctive Boolean queries and for informal ranked queries can be dramatically reduced, at little cost in terms of storage, by the inclusion of an internal index in each inverted list. This method has been applied in a retrieval system for a collection of nearly two million short documents. Our experimental results show that the selfindex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
241
0
3

Year Published

1996
1996
2012
2012

Publication Types

Select...
5
1
1

Relationship

3
4

Authors

Journals

citations
Cited by 294 publications
(246 citation statements)
references
References 25 publications
2
241
0
3
Order By: Relevance
“…In the average case, the complexity of an intersection is reduced to sub-linear time with the use of self-indexing [59] over the entity identifiers in order to skip and avoid unnecessary record comparisons.…”
Section: Query Processingmentioning
confidence: 99%
See 1 more Smart Citation
“…In the average case, the complexity of an intersection is reduced to sub-linear time with the use of self-indexing [59] over the entity identifiers in order to skip and avoid unnecessary record comparisons.…”
Section: Query Processingmentioning
confidence: 99%
“…Such a join is linear with the number of results in the case of the quad table, and sub-linear in average for the node and field index with the use of the self-indexing method [59]. In contrast, Semplore has often to resort to possibly expensive external sort before merge-join operations.…”
Section: Processing Complexitymentioning
confidence: 99%
“…As to the physical storage and access of the logical inverted index structure , it has been thoroughly studied in the IR field, which results in many optimized methods, such as byte-aligned index compression [13] and self-indexing [14]. Furthermore, in the proposed PosIdx method, relation objects enjoy the benefit of spatial locality for fast access, because positions of a term are usually physically stored together and continuously in modern IR engines.…”
Section: Fig 2 Posidx Index Structure Examplementioning
confidence: 99%
“…By adding additional indexing structures to the inverted index (e.g., self-indexing [14]), modern IR engines can supply a very efficient stream reader for a posting list AIS.…”
Section: Query Evaluationmentioning
confidence: 99%
See 1 more Smart Citation