From Distillation to Hard Negative Sampling

Formal, Thibault; Lassance, Carlos; Piwowarski, Benjamin; Clinchant, Stéphane

doi:10.1145/3477495.3531857

Cited by 67 publications

(27 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We consider dense models, i ) a "standard" bi-encoder (bi) trained with negative log-likelihood, ii ) TAS-B [28] (bi-tasb) whose training relies on topic-sampling and knowledge distillation iii ) and finally CoCondenser [22] (bi-cc) and Contriever [29] (bi-ct) which are based on contrastive pre-training. We also consider two models from the sparse family: SPLADE [21] (sp) with default training strategy, and its improved version SPLADE++ [19,20] (sp++) based on distillation, hard-negative mining and pre-training. We finally consider the late-interaction ColBERTv2 [41] (colb2).…”

Section: Methodsmentioning

confidence: 99%

“…In the meantime, another research branch brought lexical models up to date, by taking advantage of BERT and the proven efficiency of inverted indices in various manners. Such sparse approaches for instance learn contextualized term weights [10,34,55,33], query or document expansion [36], or both mechanisms jointly [21,20]. This new wave of NIR systems, which substantially differ from lexical ones -and from each other -demonstrate state-of-the-art results on several datasets, from MS MARCO [3] on which models are usually trained, to zero-shot settings such as the BEIR [46] or LoTTE [41] benchmarks.…”

Section: Related Workmentioning

confidence: 99%

“…In this sense, the performance that QPP methods achieve on NIR systems seems to correlate with the importance these systems give to lexical signals. In this regard, Formal et al [20] observed how late-interaction and sparse architectures tend to rely more on lexical signals, compared to dense ones. To further corroborate this observation, we apply the predictors to three versions of SPLADE++ with various levels of sparsit as controlled by the regularization hyperparameter.…”

Section: Qpp Models Performancementioning

confidence: 99%

See 2 more Smart Citations

Query Performance Prediction for Neural IR: Are We There Yet?

Faggioli¹,

Formal²,

Marchesin³

et al. 2023

Preprint

View full text Add to dashboard Cite

Evaluation in Information Retrieval (IR) relies on post-hoc empirical procedures, which are time-consuming and expensive operations. To alleviate this, Query Performance Prediction (QPP) models have been developed to estimate the performance of a system without the need for human-made relevance judgements. Such models, usually relying on lexical features from queries and corpora, have been applied to traditional sparse IR methods -with various degrees of success. With the advent of neural IR and large Pre-trained Language Models, the retrieval paradigm has significantly shifted towards more semantic signals. In this work, we study and analyze to what extent current QPP models can predict the performance of such systems. Our experiments consider seven traditional bag-of-words and seven BERT-based IR approaches, as well as nineteen state-of-the-art QPPs evaluated on two collections, Deep Learning '19 and Robust '04. Our findings show that QPPs perform statistically significantly worse on neural IR systems. In settings where semantic signals are prominent (e.g., passage retrieval), their performance on neural models drops by as much as 10% compared to bag-of-words approaches. On top of that, in lexical-oriented scenarios, QPPs fail to predict performance for neural IR systems on those queries where they differ from traditional approaches the most.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Qpp Models Performancementioning

confidence: 99%

See 1 more Smart Citation

Query Performance Prediction for Neural IR: Are We There Yet?

Faggioli¹,

Formal²,

Marchesin³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Now that every corpus is translated to English, we took one of the SPLADE++ [5] models and finetuned 16 different versions one on each translation (in English we just use the MIRACL corpus). This led to what we call T-SPLADE, which added to the BM25 and mDPR leads to "HYBRID 1".…”

Section: Going Back To English Leads To Improvementmentioning

confidence: 99%

“…We follow the strategy we used on our latest TREC notebooks, in that we strive for making this more streamlined than a normal research paper would be. We will now present a list of the papers that better introduce and detail the models we used here and refer the reader to check them for a better explanation than those we have here, that are mainly dedicated to how to apply it to MIRACL and not to the methods themselves: i) Training non English SPLADE models [11], ii) The SPLADE model [5,10], iii) The Contriever model and its pretraining [8], iv) The RankT5 reranker [16], v) MonoT5 [13], vi) The LCE loss [6], vii) ColBERT [9], and viii) For our ensembling we use Ranx [1] and their min-max normalized sum ensembling.…”

Section: Introductionmentioning

confidence: 99%

Extending English IR methods to multi-lingual IR

Lassance¹

2023

Preprint

View full text Add to dashboard Cite

This paper describes our participation in the 2023 WSDM CUP -MIRACL challenge. Via a combination of i) document translation; ii) multilingual SPLADE and Contriever; and iii) multilingual RankT5 and many other models, we were able to get first place in both the known and surprise languages tracks. Our strategy mostly revolved around getting the most diverse runs for the first stage and then throwing all possible reranking techniques. While this was not a first for many techniques, we had some things that we believe were never tried before, for example, we train the first SPLADE model that is effectively capable of working in more than 10 languages. However, a more careful study of the results is needed in order to verify if we were able to get first place just due to brute force or if the hybrids we developed really brought improvements over the other team's solutions.

show abstract