High-performance processing of text queries with tunable pruned term and term pair indexes

Broschart, Andreas; Schenkel, Ralf

doi:10.1145/2094072.2094077

Cited by 14 publications

(8 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The nextword index (NW) stores absolute positions for all 473,366,430 bi-grams in GOV2. While there are techniques that only partially store lists [3,13,19], we measure the exhaustive case in which all bi-grams are indexed. The position lists of NW are stored using UEF codes, and require 55 GiB, still less than the three suffix-based indexes.…”

Section: Methodsmentioning

confidence: 99%

“…Instead of performing list intersection at query time, the final set of i, fi,P pairs can be stored in the index and accessed when needed by queries. Storage limits mean that precomputing postings lists for all phrases is impossible, and techniques have been explored to choose lists to be computed, including analyzing query logs [3,19] and using collection statistics [13]. Indexing only a subset of the phrases implies that either other ways of creating lists at query time must be provided too, or that retrieval effectiveness must be sacrificed.…”

Section: Phrase Indexing Schemesmentioning

confidence: 99%

See 1 more Smart Citation

On the Cost of Phrase-Based Ranking

Petri

Moffat

2015

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Effective postings list compression techniques, and the efficiency of postings list processing schemes such as WAND, have significantly improved the practical performance of ranked document retrieval using inverted indexes. Recently, suffix array-based index structures have been proposed as a complementary tool, to support phrase searching. The relative merits of these alternative approaches to ranked querying using phrase components are, however, unclear. Here we provide: (1) an overview of existing phrase indexing techniques; (2) a description of how to incorporate recent advances in list compression and processing; and (3) an empirical evaluation of state-of-the-art suffix-array and inverted file-based phrase retrieval indexes using a standard IR test collection.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Phrase Indexing Schemesmentioning

confidence: 99%

On the Cost of Phrase-Based Ranking

Petri

Moffat

2015

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

show abstract

“…Existing approaches to improving efficiency make use of term pair co-occurrence indexes, and employ early termination in order to balance space costs and query time [3,19]. But unless the index can be very large, use of proximity-based metrics beyond co-occurrence usually requires on-the-fly computation of proximity scores, and possibly high retrieval times in even moderately sized collections.…”

Section: Introductionmentioning

confidence: 99%

“…However, calculating proximity for all terms is computationally expensive. To address that issue, several recent studies have looked at the trade-offs possible through term pair co-occurrence indexing, or other similar means [3,5,7,14]. In contrast to viewing query terms separately, Song et al [15] group query terms into non-overlapped phrases referred to as a span.…”

Section: Introductionmentioning

confidence: 99%

How Effective are Proximity Scores in Term Dependency Models?

Moffat

Culpepper

2014

Proceedings of the 2014 Australasian Document Computing Symposium

View full text Add to dashboard Cite

The dominant retrieval models in information retrieval systems today are variants of TF×IDF, and typically use bag-of-words processing in order to balance recall and precision. However, the size of collections continues to increase, and the number of results produced by these models exceeds the number of documents that can be reasonably assessed. To address this need, researchers and commercial providers are now looking at more expensive computational models to improve the quality of the results returned. One such method is to incorporate term proximity into the ranking model. We explore the effectiveness gains achievable when term proximity is a factor used in ranking algorithms, and explore the relative effectiveness of several variants of the term dependency model. Our goal is to understand how these proximity-based models improve effectiveness.

show abstract

“…Formalisms such as Coherence Relations (Hobbs, 1990), Discourse Representation Theory (DRT) (Kamp, 1984, Kamp and Reyle, 1993, Bos, 2008, Segmented Discourse Representation Theory (SDRT) (Asher and Lascarides, 2003) and Rhetorical Structure Theory (RST) (Mann and Thompson, 1988, Marcu et al, 1999, Marcu, 2000, are relevant to the segmentation of narrative discourse, but they illuminate other aspects of structure than the ones I am focused on here. 2 As for the discourse-level theories of story grammars, e.g., Rumelhart (1977), van Dijk (1979), these are certainly relevant to plot and will be discussed in Chapter 4. 1 DRT is concerned primarily with reference; while reference is discussed in this chapter, the details of the semantic representations used in DRT are outside the scope of 1.1.…”

Section: Introductionmentioning

confidence: 99%

Computational Modeling of Narrative

Mani¹

2012

Synthesis Lectures on Human Language Technologies

View full text Add to dashboard Cite

Synthesis Lectures on Human Language Technologies is edited by Graeme Hirst of the University of Toronto. The series consists of 50-to 150-page monographs on topics relating to natural language processing, computational linguistics, information retrieval, and spoken language understanding. Emphasis is on important new techniques, on new applications, and on topics that combine two or more HLT subfields.

show abstract

High-performance processing of text queries with tunable pruned term and term pair indexes

Cited by 14 publications

References 41 publications

On the Cost of Phrase-Based Ranking

On the Cost of Phrase-Based Ranking

How Effective are Proximity Scores in Term Dependency Models?

Computational Modeling of Narrative

Contact Info

Product

Resources

About