2012
DOI: 10.1007/978-3-642-35341-3_3
|View full text |Cite
|
Sign up to set email alerts
|

The Reusability of a Diversified Search Test Collection

Abstract: Traditional "ad hoc" test collections, typically built based on depth-100 pools, are often used a posteriori by non-contributors, i.e., research groups that did not contribute the pools. The Leave One Out (LOO) test is useful for testing whether the test collections are actually reusable: that is, whether the non-contributors can be evaluated fairly relative to the contributors' official performances. In contrast, at the recent web search result diversification tasks of TREC and NTCIR, diversity test collectio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
14
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(14 citation statements)
references
References 29 publications
0
14
0
Order By: Relevance
“…However, it should be noted that diversity test collections are highly unlikely to be reusable [9,11]: thus, if researchers want to continue improving diversified search 10 , we do require a new diversity test collection. Note also that now a new corpus, ClueWeb12, is available [4].…”
Section: Future Directionsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, it should be noted that diversity test collections are highly unlikely to be reusable [9,11]: thus, if researchers want to continue improving diversified search 10 , we do require a new diversity test collection. Note also that now a new corpus, ClueWeb12, is available [4].…”
Section: Future Directionsmentioning
confidence: 99%
“…Whereas, all runs from LIA and TUTA1 significantly underperformed THUIR-S-E-4A.Chinese Subtopic Mining(Figure 4) TUTA1-S-C-1A outperformed all other runs in terms of Mean D -nDCG, but the six participating teams are statistically indistinguishable from one another 9. The TREC 2011 and 2012 diversity test collections have graded relevance assessments; all TREC diversity test collections(2009- 2012) have the informational and navigational subtopic tags.…”
mentioning
confidence: 97%
“…The exact cutoff z used for each run is referred to as the pool depth. This strategy tends to find most relevant documents for each topic, but provides no guarantees particularly when entirely new systems are evaluated [33,8,24,23].…”
Section: Introductionmentioning
confidence: 98%
“…Actually, there exist another option that we can reuse the historical labels in evaluation to save the labeling efforts. Nevertheless, due to the existence of the unlabeled documents, current measures for novelty and diversity are not reusable [26].…”
Section: Introductionmentioning
confidence: 99%