2005
DOI: 10.1197/jamia.m1733
|View full text |Cite
|
Sign up to set email alerts
|

Agreement, the F-Measure, and Reliability in Information Retrieval

Abstract: Information retrieval studies that involve searching the Internet or marking phrases usually lack a well-defined number of negative cases. This prevents the use of traditional interrater reliability metrics like the kappa statistic to assess the quality of expert-generated gold standards. Such studies often quantify system performance as precision, recall, and F-measure, or as agreement. It can be shown that the average F-measure among pairs of experts is numerically identical to the average positive specific … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

6
430
0
4

Year Published

2009
2009
2020
2020

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 738 publications
(440 citation statements)
references
References 5 publications
6
430
0
4
Order By: Relevance
“…However, it is not suitable for entity recognition tasks [26]. We adopt the F-measure proposed by [13], which allows computing pair-wise inter-annotator agreement using the standard Precision, Recall and the harmonic F-measure in information studies by treating one annotator as gold standard and the other as predictions. Table 1 shows the pair-wise agreement for each entity class.…”
Section: Cost Of the Processmentioning
confidence: 99%
“…However, it is not suitable for entity recognition tasks [26]. We adopt the F-measure proposed by [13], which allows computing pair-wise inter-annotator agreement using the standard Precision, Recall and the harmonic F-measure in information studies by treating one annotator as gold standard and the other as predictions. Table 1 shows the pair-wise agreement for each entity class.…”
Section: Cost Of the Processmentioning
confidence: 99%
“…This metric approximates the kappa coefficient (Cohen, 1960) 8 http://labda.inf.uc3m.es/SpanishADRCorpus when the number of true negatives (TN) is very large (Hripcsak and Rothschild, 2005). In our case, we can state that the number of TN is very high since TN are all the terms that are not true positives, false positives nor false negatives.…”
Section: Corpus Creationmentioning
confidence: 91%
“…The current version of Inforex enables simultaneous and independent annotation of the same text sample by more than one annotator. Moreover, the annotation process coordinator may keep track of inter-annotator agreement between two raters thanks to the Agreement module which uses Positive Specific Agreement (PSA) measure (Hripcsak and Rothschild, 2005) to calculate the reliability (see Figure 5). View configuration gives the opportunity to define annotation layers, subsets or categories, users and set of documents that have to be analysed.…”
Section: Annotation Agreementmentioning
confidence: 99%