2023
DOI: 10.48550/arxiv.2302.14723
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Extending English IR methods to multi-lingual IR

Abstract: This paper describes our participation in the 2023 WSDM CUP -MIRACL challenge. Via a combination of i) document translation; ii) multilingual SPLADE and Contriever; and iii) multilingual RankT5 and many other models, we were able to get first place in both the known and surprise languages tracks. Our strategy mostly revolved around getting the most diverse runs for the first stage and then throwing all possible reranking techniques. While this was not a first for many techniques, we had some things that we bel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…Two-stage retrieval (Matveeva et al, 2006;Liu et al, 2009;Wang et al, 2011;Yang et al, 2019) is a widely adopted approach that combines the strengths of retrieval models and rerank models for effective information retrieval. It has emerged as the preferred pipeline for competitive IR competition tasks (Lassance, 2023;Huang et al, 2023;Zhang et al, 2023).…”
Section: Two-stage Retrievalmentioning
confidence: 99%
“…Two-stage retrieval (Matveeva et al, 2006;Liu et al, 2009;Wang et al, 2011;Yang et al, 2019) is a widely adopted approach that combines the strengths of retrieval models and rerank models for effective information retrieval. It has emerged as the preferred pipeline for competitive IR competition tasks (Lassance, 2023;Huang et al, 2023;Zhang et al, 2023).…”
Section: Two-stage Retrievalmentioning
confidence: 99%
“…However, directly applying Eq. ( 2) to multilingual pre-trained MLM heads over the whole vocabulary increases computation cost for both training and inference due to large vocabulary size |V wp | in MLM projector, W mlm (Nair et al, 2022;Lassance, 2023). For example, the mBERT and XLM-R have respective vocabulary size of 120K and 250K (vs BERT's 35K) in the MLM projector.…”
Section: Softmaxmentioning
confidence: 99%