Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management 2007
DOI: 10.1145/1321440.1321528
|View full text |Cite
|
Sign up to set email alerts
|

A comparison of statistical significance tests for information retrieval evaluation

Abstract: Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student's paired t-test, the Wilcoxon signed rank test, and the sign test. Other researchers have previously proposed using both the bootstrap and Fisher's randomization (permutation) test as nonparametric significance tests for IR but these tests have seen little use. For each of these five tests, we took the ad-hoc retrieval runs submitted to TRECs 3 and 5-8, and for each pair of runs, we measured the statistical… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

8
308
1
1

Year Published

2008
2008
2017
2017

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 553 publications
(318 citation statements)
references
References 21 publications
8
308
1
1
Order By: Relevance
“…Differences of acylcarnitine markers between groups were tested for significance by randomised one-way ANOVA (Edgington 1995;Howell 2001;Smucker et al 2007). For analysis of acylcarnitines in relation to time of sampling, curve fitting and nonlinear regression were performed using R (Spiess 2012;Elzhov et al 2013).…”
Section: Discussionmentioning
confidence: 99%
“…Differences of acylcarnitine markers between groups were tested for significance by randomised one-way ANOVA (Edgington 1995;Howell 2001;Smucker et al 2007). For analysis of acylcarnitines in relation to time of sampling, curve fitting and nonlinear regression were performed using R (Spiess 2012;Elzhov et al 2013).…”
Section: Discussionmentioning
confidence: 99%
“…We chose a one-tailed non-parametric randomisation test (i.e. permutation test) due to its robustness in information retrieval as shown by Smucker et al (2007). 6 We performed the test using 100,000 random samples with a confidence interval value of 95% (i.e.…”
Section: Average R-precisionmentioning
confidence: 99%
“…Does consolidation improve federated search? We assess the ranking using Normalized Discounted Cumulative Gain (NDCG) and report the results in Table 6a for the movie and in Table 6b for the publication scenario and indicate statistically significant improvements using Fisher's two-sided, paired randomization test [19]. Systems.…”
Section: Ranking Evaluationmentioning
confidence: 99%