HC4: A New Suite of Test Collections for Ad Hoc CLIR

Lawrie, Dawn; Mayfield, James; Oard, Douglas W.; Yang, Eugene

doi:10.1007/978-3-030-99736-6_24

Cited by 18 publications

(10 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The HC4 [14] dataset was employed as a validation set for the selection of the most optimal translators and first-stage retrievers due to the shared language coverage between it and NeuCLIR, as well as the existence of overlapping annotated query-document pairs between the two datasets. The RRF and SPLADE first-stage runs were provided by the NLE and h2loo teams, however, at the time of submission, the NLE team did not have a SPLADE model available for Chinese.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval

Vitor¹,

Lotufo²,

Nogueira³

2023

Preprint

View full text Add to dashboard Cite

This paper reports on a study of cross-lingual information retrieval (CLIR) using the mT5-XXL reranker on the NeuCLIR track of TREC 2022. Perhaps the biggest contribution of this study is the finding that despite the mT5 model being fine-tuned only on query-document pairs of the same language it proved to be viable for CLIR tasks, where query-document pairs are in different languages, even in the presence of suboptimal first-stage retrieval performance. The results of the study show outstanding performance across all tasks and languages, leading to a high number of winning positions. Finally, this study provides valuable insights into the use of mT5 in CLIR tasks and highlights its potential as a viable solution. For reproduction refer to https://github.com/unicamp-dl/ NeuCLIR22-mT5

show abstract

Section: Methodsmentioning

confidence: 99%

“…Regarding the branch of multilingual and CLIR, it is crucial to have access to appropriate datasets that can be used for both development and evaluation of models. In recent years, several datasets that support research in this area have been made publicly available, such as Fire [22,21], MLQA [15], NTCIR [30], Mr. Tydi [40], and HC4 [13].…”

Section: Related Workmentioning

confidence: 99%

NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval

Vitor¹,

Lotufo²,

Nogueira³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…We perform relevance assessment on a graded scale (between 0 and 3) using developed guidelines to ensure a consistent assessment process. Guidelines take inspiration from those of HC4 [19] and are adapted for our tasks (full guidelines online).…”

Section: Relevance Criteriamentioning

confidence: 99%

CODEC: Complex Document and Entity Collection

Mackie

Owoicho

Carlos

et al. 2022

Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

show abstract

“…We evaluate the final retrieval models on HC4 [26], a newly constructed evaluation collection for CLIR, for Chinese and Persian, NTCIR [31] for Chinese, CLEF 08-09 for Persian [1,14], and CLEF 03 [4] for French and German. HC4 consists of 50 topics for each language.…”

Section: Datasetsmentioning

confidence: 99%

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

Yang,

Nair,

Chandradevan

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Pretrained language models have improved effectiveness on numerous tasks, including ad-hoc retrieval. Recent work has shown that continuing to pretrain a language model with auxiliary objectives before fine-tuning on the retrieval task can further improve retrieval effectiveness. Unlike monolingual retrieval, designing an appropriate auxiliary task for cross-language mappings is challenging. To address this challenge, we use comparable Wikipedia articles in different languages to further pretrain off-the-shelf multilingual pretrained models before fine-tuning on the retrieval task. We show that our approach yields improvements in retrieval effectiveness. CCS CONCEPTS• Information systems → Retrieval models and ranking.

show abstract

HC4: A New Suite of Test Collections for Ad Hoc CLIR

Cited by 18 publications

References 31 publications

NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval

NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval

CODEC: Complex Document and Entity Collection

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

Contact Info

Product

Resources

About