2021
DOI: 10.48550/arxiv.2108.13897
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(13 citation statements)
references
References 0 publications
0
13
0
Order By: Relevance
“…-Query Translation: BM25 retrieval using translated queries produced by a specific MT model and original documents in the target language. 8 -Reranking: We rerank query translation baseline results using the public mT5 reranker 9 trained on translated MS MARCO in 8 languages [4].…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…-Query Translation: BM25 retrieval using translated queries produced by a specific MT model and original documents in the target language. 8 -Reranking: We rerank query translation baseline results using the public mT5 reranker 9 trained on translated MS MARCO in 8 languages [4].…”
Section: Methodsmentioning
confidence: 99%
“…6 Machine Translation. For CLEF languages, we use MS MARCO passage translations 7 from Bonifacio et al [4], and the same MT model to translate queries. For the HC4 languages, we use directional MT models built on top of a transformer base architecture (6-layer encoder/decoder) using Sockeye [8].…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…For example, automatic translation of the GLUE benchmark, Stanford Natural Language Infer-ence (SNLI) Corpus, and SciTail Dataset in Portuguese is provided by [GOMES 2020]. [Bonifacio et al 2021] provides a multilingual translation of the MS MARCO passage ranking dataset.…”
Section: Related Workmentioning
confidence: 99%