2020
DOI: 10.1007/s13222-020-00334-y
|View full text |Cite
|
Sign up to set email alerts
|

Humans Optional? Automatic Large-Scale Test Collections for Entity, Passage, and Entity-Passage Retrieval

Abstract: Manually creating test collections is a time-, effort-, and cost-intensive process. This paper describes a fully automatic alternative for deriving large-scale test collections, where no human assessments are needed. The empirical experiments confirm that automatic test collection and manual assessments agree on the best performing systems. The collection includes relevance judgments for both text passages and knowledge base entities. Since test collections with relevance data for both entity and text passages… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 50 publications
0
4
0
Order By: Relevance
“…For this reason we conduct a quality assessment of a sample from the automatically derived dataset and report results in Section 5. Using a similar technique, we provided a fully-automatic dataset for the TREC Complex Answer Retrieval track, for which the validity is confirmed through manual assessments produced by NIST [6]. In any case, we recommend to use this dataset in combination with the manually verified Nanni's 201 dataset.…”
Section: Discussion On Automatic Test Collectionsmentioning
confidence: 99%
See 2 more Smart Citations
“…For this reason we conduct a quality assessment of a sample from the automatically derived dataset and report results in Section 5. Using a similar technique, we provided a fully-automatic dataset for the TREC Complex Answer Retrieval track, for which the validity is confirmed through manual assessments produced by NIST [6]. In any case, we recommend to use this dataset in combination with the manually verified Nanni's 201 dataset.…”
Section: Discussion On Automatic Test Collectionsmentioning
confidence: 99%
“…We use 20 restarts per fold with 20 iterations each. Rank-lips: List-wise learning-to-rank toolkit 6 with mini-batched training, using coordinate ascent to optimize for mean-average precision. Mini-batches of 1000 instances are iterated until the training MAP score changes by less than 1%.…”
Section: Machine Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…We derive hierarchical and flat clustering benchmarks as described in Section 3.2, where each section is interpreted as one subtopic, i.e., one ground truth gold cluster of passages. While this benchmark is automatically generated, it has been demonstrated to align well with relevance judgments of human assessors [13].…”
Section: Evaluation On Wikipediamentioning
confidence: 99%