Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2002
DOI: 10.1145/564376.564383
|View full text |Cite
|
Sign up to set email alerts
|

The Importance of Prior Probabilities for Entry Page Search

Abstract: An important class of searches on the world-wide-web has the goal to find an entry page (homepage) of an organisation. Entry page search is quite different from Ad Hoc search. Indeed a plain Ad Hoc system performs disappointingly. We explored three non-content features of web pages: page length, number of incoming links and URL form. Especially the URL form proved to be a good predictor. Using URL form priors we found over 70% of all entry pages at rank 1, and up to 89% in the top 10. Non-content features can … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

8
145
0

Year Published

2005
2005
2009
2009

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 175 publications
(153 citation statements)
references
References 22 publications
8
145
0
Order By: Relevance
“…All that is needed is an estimate of the prior probability of relevance given a particular surface feature (or a set of such features). The content scores can then easily be updated, simply by multiplying them with this prior (similar techniques in web retrieval and XML retrieval are discussed in [9,10] respectively). Note that in other retrieval models, surface features can be incorporated in a similar fashion by using a weighting of returned element scores based on their surface features.…”
Section: Methodsmentioning
confidence: 99%
“…All that is needed is an estimate of the prior probability of relevance given a particular surface feature (or a set of such features). The content scores can then easily be updated, simply by multiplying them with this prior (similar techniques in web retrieval and XML retrieval are discussed in [9,10] respectively). Note that in other retrieval models, surface features can be incorporated in a similar fashion by using a weighting of returned element scores based on their surface features.…”
Section: Methodsmentioning
confidence: 99%
“…In most cases p(r|D) is taken to be uniform [17]. However, there have been several studies where the document length and link structure have been encoded as a prior probability, for ad-hoc and some non ad hoc tasks [6], [16]. Most weighting models include document length as a part of their core query-dependent retrieval model and that might be one of the reasons for traditionally not being considered a document static feature.…”
Section: Document Priors In the Language Modeling Approachmentioning
confidence: 99%
“…using the term frequency in specific fields of structured documents (e.g. title, abstract) [11], or integrating query-independent evidence in the retrieval model in the form of prior probabilities for a document [3,6] ('prior' because they are known before the query). In short, when determining the relevance between a query and a document, most IR models use primarily query-dependent term statistics, and sometimes also add query-independent evidence to further enhance retrieval performance.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations