Efficient Text Proximity Search

Schenkel, Ralf; Broschart, Andreas; Hwang, Seung-won; Theobald, Martin; Weikum, Gerhard

doi:10.1007/978-3-540-75530-2_26

Cited by 49 publications

(50 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The basic segmentation is the one where each keyword is treated as a phrase [6], [7]. Each generated segmentation corresponds to a way of accessing the indexes to compute its answers.…”

Section: Valid Phrases In a Querymentioning

confidence: 99%

Efficient Proximity Search with Query Logs

Holambe¹,

Patil²

2015

IJCA

View full text Add to dashboard Cite

In information retrieval technology there are various techniques for fetching data from resources. And that technique also contains various issues. Information retrieval techniques require advanced manipulating schemes which improves keyword search. There are many techniques have been proposed but results get down when large amount data interrupted. In this paper, have tendency to achieve efficient time and space complexities by integrating proximity information. This system improves the performance by using previous searching results. All the previous system consist basic solutions for extracting results and ranking them. Query logs consists the last searching results and use that results for next search. Fuzzy keyword search truly enhance the system usability. Existing system in databases requires to write complete keyword for searching but by using auto-complete scheme it is easy to type less and find more. In this system proper demand paging algorithm is used for finding previous results. General TermsAlgorithm, Performance.

show abstract

“…The basic segmentation is the one where each keyword is treated as a phrase [6], [7]. Each generated segmentation corresponds to a way of accessing the indexes to compute its answers.…”

Section: Valid Phrases In a Querymentioning

confidence: 99%

Efficient Proximity Search with Query Logs

Holambe¹,

Patil²

2015

IJCA

View full text Add to dashboard Cite

show abstract

“…Schenkel et al [18] developed efficient topk query processing techniques for a proximity-aware IR model. They focused on a proximity-aware scoring function defined by a linear combination of a standard BM25-based score and a proximity score, and extended an existing top-k query processing technique [20] that was originally intended for a standard IR model such as TF-IDF and BM25.…”

Section: Related Workmentioning

confidence: 99%

“…They showed that their techniques speeded up evaluation considerably with an improved result quality. However, since the underlying top-k query processing technique intends for relatively short queries as does other existing top-k query processing techniques, those evaluated efficiently by their techniques in [18] are limited to relatively short queries.…”

Section: Related Workmentioning

confidence: 99%

“…There has been a considerable amount of work on optimization techniques including index compression and caching [21], result caching [14], and top-k query processing [1], [2], [5], [11], [13], [17], [18]. In this paper we foCopyright c 2013 The Institute of Electronics, Information and Communication Engineers cus on top-k query processing techniques, which find the exact top-k documents without processing the entire posting list for each query term.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix — Pursuit of Enhanced Informational Search on the Web —

Fujita

Oyama

2013

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYWith the successful adoption of link analysis techniques such as PageRank and web spam filtering, current web search engines well support "navigational search". However, due to the use of a simple conjunctive Boolean filter in addition to the inappropriateness of user queries, such an engine does not necessarily well support "informational search". Informational search would be better handled by a web search engine using an informational retrieval model combined with enhancement techniques such as query expansion and relevance feedback. Moreover, the realization of such an engine requires a method to prosess the model efficiently. In this paper we propose a novel extension of an existing top-k query processing technique to improve search efficiency. We add to it the technique utilizing a simple data structure called a "term-document binary matrix," resulting in more efficient evaluation of top-k queries even when the queries have been expanded. We show on the basis of experimental evaluation using the TREC GOV2 data set and expanded versions of the evaluation queries attached to this data set that the proposed method can speed up evaluation considerably compared with existing techniques especially when the number of query terms gets larger. key words: web search engine, top-k query processing, early pruning, early termination, term-document binary matrix

show abstract

“…, T D } of D string documents of total length n, drawn from an alphabet Σ = [σ], and the query is a pattern P [1..p] over Σ. Muthukrishnan considered a family of problems called thresholded document listing: given an additional parameter K, list only the documents where some function score(P, d) of the occurrences of P in T d exceeded K. For example, the document mining problem aims to return the documents where P appears at least K times, whereas the repeats problem aims to return the documents where two occurrences of P appear at distance at most K. While document mining has obvious connections with typical term-frequency measures of relevance [6,1], the repeats problem is more connected to various problems in bioinformatics [4,10]. Also notice that the repeats problem is closely related to the term proximity based document retrieval in IR field [32,5,29,33,34]. Muthukrishnan achieved optimal time for both problems, with O(n) space (in words) if K is specified at indexing time and O(n log n) if specified at query time.…”

Section: Introductionmentioning

confidence: 99%

Top-k Term-Proximity in Succinct Space

et al. 2016

View full text Add to dashboard Cite

Abstract. Let D = {T1, T2, . . . , TD} be a collection of D string documents of n characters in total, that are drawn from an alphabet set Σ = [σ]. The top-k document retrieval problem is to preprocess D into a data structure that, given a query (P [1..p], k), can return the k documents of D most relevant to pattern P . The relevance is captured using a predefined ranking function, which depends on the set of occurrences of P in T d . For example, it can be the term frequency (i.e., the number of occurrences of P in T d ), or it can be the term proximity (i.e., the distance between the closest pair of occurrences of P in T d ), or a patternindependent importance score of T d such as PageRank. Linear space and optimal query time solutions already exist for this problem. Compressed and compact space solutions are also known, but only for a few ranking functions such as term frequency and importance. However, space efficient data structures for term proximity based retrieval have been evasive. In this paper we present the first sub-linear space data structure for this relevance function, which uses only o(n) bits on top of any compressed suffix array of D and solves queries in time O((p + k) polylog n).

show abstract

Efficient Text Proximity Search

Cited by 49 publications

References 16 publications

Efficient Proximity Search with Query Logs

Efficient Proximity Search with Query Logs

Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix — Pursuit of Enhanced Informational Search on the Web —

Top-k Term-Proximity in Succinct Space

Contact Info

Product

Resources

About