1996
DOI: 10.1002/(sici)1097-4571(199601)47:1<37::aid-asi4>3.0.co;2-3
|View full text |Cite
|
Sign up to set email alerts
|

Variations in relevance assessments and the measurement of retrieval effectiveness

Abstract: The purpose of this article is to bring attention to the problem of variations in relevance assessments and the effects that these may have on measures of retrieval effectiveness. Through an analytical review of the literature, I show that despite known wide variations in relevance assessments in experimental test collections, their effects on the measurement of retrieval performance are almost completely unstudied. I will further argue that what we know about the many variables that have been found to affect … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
67
0

Year Published

1997
1997
2017
2017

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 109 publications
(67 citation statements)
references
References 16 publications
0
67
0
Order By: Relevance
“…The ®eld has almost as long a history of criticism of this experimental paradigm (Cuadra & Katter, 1967;Harter, 1996;Taube, 1965). The gist of the critics' complaint is that relevance is inherently subjective.…”
Section: Introductionmentioning
confidence: 99%
“…The ®eld has almost as long a history of criticism of this experimental paradigm (Cuadra & Katter, 1967;Harter, 1996;Taube, 1965). The gist of the critics' complaint is that relevance is inherently subjective.…”
Section: Introductionmentioning
confidence: 99%
“…For example, the degree of URL u 1 in Table 5 is 5; we normalized it by dividing with 17, which is the sum of all URLs degrees. Equation (6) shows the Pseudo Relevance score (PR s ) for the pages fetched for the IoT-related information need.…”
Section: Text Similarity Scorementioning
confidence: 99%
“…Schamber arranges 80 such criteria in a table [7:11]. Harter [12] also comments on the wide range of criteria derived from such studies. Saracevic summarises these criteria, which he calls 'clues' [5:2130], noting that although there is variety in their labels, they are 'remarkably similar' in concept.…”
Section: Relevancementioning
confidence: 99%