2019
DOI: 10.1177/0165551519866551
|View full text |Cite
|
Sign up to set email alerts
|

An intrinsic evaluation of the Waterloo spam rankings of the ClueWeb09 and ClueWeb12 datasets

Abstract: The ClueWeb09 dataset and its successor, the ClueWeb12 dataset, are two of the largest collections of Web pages released by Text REtrieval Conference (TREC). The ClueWeb datasets were used in various tracks of TREC ran through 2009 to 2017. For every year, approximately 50 new queries are released and a pool of Web pages are judged against these queries by human assessors as relevant, non-relevant or spam. In this article, a ground truth for binary classification (spam vs non-spam) is constructed from Web page… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
references
References 26 publications
0
0
0
Order By: Relevance