2021
DOI: 10.48550/arxiv.2108.08787
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval

Abstract: We present Mr. TYDI, a multi-lingual benchmark dataset for mono-lingual retrieval in eleven typologically diverse languages, designed to evaluate ranking with learned dense representations. The goal of this resource is to spur research in dense retrieval techniques in non-English languages, motivated by recent observations that existing techniques for representation learning perform poorly when applied to out-of-distribution data. As a starting point, we provide zero-shot baselines for this new dataset based o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0
1

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 21 publications
0
11
0
1
Order By: Relevance
“…Hence, we adopt XOR-QA [Asai et al, 2020] dataset and Mr. TYDI [Zhang et al, 2021] dataset to evaluate our method on the two settings. Both of the two datasets are constructed from TYDI, a question answering dataset covering eleven typologically diverse languages.…”
Section: Cross Lingual Query Passage Retrievalmentioning
confidence: 99%
See 2 more Smart Citations
“…Hence, we adopt XOR-QA [Asai et al, 2020] dataset and Mr. TYDI [Zhang et al, 2021] dataset to evaluate our method on the two settings. Both of the two datasets are constructed from TYDI, a question answering dataset covering eleven typologically diverse languages.…”
Section: Cross Lingual Query Passage Retrievalmentioning
confidence: 99%
“…The Mr. TYDI dataset is a multi-lingual benchmark dataset for mono-lingual query passage retrieval in eleven typologically diverse languages, designed to evaluate ranking with learned dense representations. Same to the source paper [Zhang et al, 2021], we use MRR@100 and Recall@100 as metrics.…”
Section: Cross Lingual Query Passage Retrievalmentioning
confidence: 99%
See 1 more Smart Citation
“…First, in the passage retrieval step, we replace mBERT with mLUKE (Ri et al, 2021). Second, we construct sparse indices from which we will retrieve passages to augment dense retriever-retrieved passages, inspired by Zhang et al (2021) but uses a different densesparse hybrid approach. Finally, we encode each question and passage independently as opposed to all passages together following the Fusion-in-Decoder (Izacard and Grave, 2020) approach.…”
Section: System Architecture and Pipelinementioning
confidence: 99%
“…-Judgment pools were retrieved using older systems. New neural systems are thus more likely to systematically identify relevant unjudged documents [38,40,46]. -Many of the early test collections have only binary judgments.…”
Section: Introductionmentioning
confidence: 99%