Proceedings of the 14th ACM International Conference on Information and Knowledge Management 2005
DOI: 10.1145/1099554.1099630
|View full text |Cite
|
Sign up to set email alerts
|

Maximal termsets as a query structuring mechanism

Abstract: Search engines process queries conjunctively to restrict the size of the answer set. Further, it is not rare to observe a mismatch between the vocabulary used in the text of Web pages and the terms used to compose the Web queries. The combination of these two features might lead to irrelevant query results, particularly in the case of more specific queries composed of three or more terms. To deal with this problem we propose a new technique for automatically structuring Web queries as a set of smaller subqueri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
13
0

Year Published

2006
2006
2011
2011

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 10 publications
(13 citation statements)
references
References 6 publications
0
13
0
Order By: Relevance
“…As a baseline we adapt the maximal termset approach by Pôssas et al [3], but we do not use GENMAX as a subroutine to enlarge promising keyphrase subsets. Instead, we adopt {w1, w2} {w1, w5} {w2, w4} {w2, w5} {w3, w5} {w1,w2,w3} {w1,w2,w4} {w1,w2,w5} {w1,w3,w4} {w1,w3,w5} {w1,w4,w5} {w2,w3,w4} {w2,w3,w5} {w2,w4,w5} {w1, w2, w3, w4} {w1, w2, w3, w5} {w1, w2, w4, w5} {w1, w3, w4, w5} {w2, w3, w4, w5} {w3, w4} Table 1's example scenario.…”
Section: Basic Definitions and The Baseline Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…As a baseline we adapt the maximal termset approach by Pôssas et al [3], but we do not use GENMAX as a subroutine to enlarge promising keyphrase subsets. Instead, we adopt {w1, w2} {w1, w5} {w2, w4} {w2, w5} {w3, w5} {w1,w2,w3} {w1,w2,w4} {w1,w2,w5} {w1,w3,w4} {w1,w3,w5} {w1,w4,w5} {w2,w3,w4} {w2,w3,w5} {w2,w4,w5} {w1, w2, w3, w4} {w1, w2, w3, w5} {w1, w2, w4, w5} {w1, w3, w4, w5} {w2, w3, w4, w5} {w3, w4} Table 1's example scenario.…”
Section: Basic Definitions and The Baseline Methodsmentioning
confidence: 99%
“…Unfortunately, it is straightforward to construct situations in which the approach fails although adequate queries exist. A more involved maximal termset query formulation method is proposed by Pôssas et al [3]; we use an adapted version as our baseline.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Shapiro and Taksa [11] suggest a rather simple open end query formulation approach for which it is straightforward to find situations where the approach fails although appropriate queries exist. A more involved maximal termset method is proposed by Pôssas et al [10]. However, both approaches focus on finding a whole set of queries instead of just one maximum query and neither Shapiro and Taksa …”
Section: B Related Workmentioning
confidence: 99%
“…The retrieved web documents can then be delivered to a text reuse detection system for an in-depth analysis. We focus on the query formulation problem as the crucial first step in the detection process and present a new query formulation strategy that achieves convincing results: compared to a maximal termset query formulation strategy [10,14], which is the most sensible non-heuristic baseline, we save on average 70% of the queries in realistic experiments. With respect to the candidate documents' quality, our heuristic retrieves documents that are, on average, more similar to the given document than the results of previously published query formulation strategies [4,8].…”
mentioning
confidence: 99%