Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2022
DOI: 10.1145/3477495.3531857
|View full text |Cite
|
Sign up to set email alerts
|

From Distillation to Hard Negative Sampling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
26
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 67 publications
(27 citation statements)
references
References 13 publications
1
26
0
Order By: Relevance
“…We consider dense models, i ) a "standard" bi-encoder (bi) trained with negative log-likelihood, ii ) TAS-B [28] (bi-tasb) whose training relies on topic-sampling and knowledge distillation iii ) and finally CoCondenser [22] (bi-cc) and Contriever [29] (bi-ct) which are based on contrastive pre-training. We also consider two models from the sparse family: SPLADE [21] (sp) with default training strategy, and its improved version SPLADE++ [19,20] (sp++) based on distillation, hard-negative mining and pre-training. We finally consider the late-interaction ColBERTv2 [41] (colb2).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We consider dense models, i ) a "standard" bi-encoder (bi) trained with negative log-likelihood, ii ) TAS-B [28] (bi-tasb) whose training relies on topic-sampling and knowledge distillation iii ) and finally CoCondenser [22] (bi-cc) and Contriever [29] (bi-ct) which are based on contrastive pre-training. We also consider two models from the sparse family: SPLADE [21] (sp) with default training strategy, and its improved version SPLADE++ [19,20] (sp++) based on distillation, hard-negative mining and pre-training. We finally consider the late-interaction ColBERTv2 [41] (colb2).…”
Section: Methodsmentioning
confidence: 99%
“…In the meantime, another research branch brought lexical models up to date, by taking advantage of BERT and the proven efficiency of inverted indices in various manners. Such sparse approaches for instance learn contextualized term weights [10,34,55,33], query or document expansion [36], or both mechanisms jointly [21,20]. This new wave of NIR systems, which substantially differ from lexical ones -and from each other -demonstrate state-of-the-art results on several datasets, from MS MARCO [3] on which models are usually trained, to zero-shot settings such as the BEIR [46] or LoTTE [41] benchmarks.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Now that every corpus is translated to English, we took one of the SPLADE++ [5] models and finetuned 16 different versions one on each translation (in English we just use the MIRACL corpus). This led to what we call T-SPLADE, which added to the BM25 and mDPR leads to "HYBRID 1".…”
Section: Going Back To English Leads To Improvementmentioning
confidence: 99%
“…We follow the strategy we used on our latest TREC notebooks, in that we strive for making this more streamlined than a normal research paper would be. We will now present a list of the papers that better introduce and detail the models we used here and refer the reader to check them for a better explanation than those we have here, that are mainly dedicated to how to apply it to MIRACL and not to the methods themselves: i) Training non English SPLADE models [11], ii) The SPLADE model [5,10], iii) The Contriever model and its pretraining [8], iv) The RankT5 reranker [16], v) MonoT5 [13], vi) The LCE loss [6], vii) ColBERT [9], and viii) For our ensembling we use Ranx [1] and their min-max normalized sum ensembling.…”
Section: Introductionmentioning
confidence: 99%