2021
DOI: 10.48550/arxiv.2109.10086
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval

Abstract: In neural Information Retrieval (IR), ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to work well. Meanwhile, there has been a growing interest in learning sparse representations for documents and queries, that could inherit from the desirable properties of bag-of-words models such as the exact matching of terms and the efficiency of inverted indexes. Introdu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
84
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 40 publications
(84 citation statements)
references
References 22 publications
0
84
0
Order By: Relevance
“…We also fine-tune the unsupervised model on MS-MARCO and evaluate it on a suite of zero-shot search tasks in the BEIR benchmark (Thakur et al, 2021). In the transfer setting, our models achieve a 5.2% relative improvement over previous methods (Izacard et al, 2021) and is comparable even with methods (Santhanam et al, 2021;Formal et al, 2021;Wang et al, 2020) that demand substantially more computation at test time.…”
Section: Introductionmentioning
confidence: 67%
See 2 more Smart Citations
“…We also fine-tune the unsupervised model on MS-MARCO and evaluate it on a suite of zero-shot search tasks in the BEIR benchmark (Thakur et al, 2021). In the transfer setting, our models achieve a 5.2% relative improvement over previous methods (Izacard et al, 2021) and is comparable even with methods (Santhanam et al, 2021;Formal et al, 2021;Wang et al, 2020) that demand substantially more computation at test time.…”
Section: Introductionmentioning
confidence: 67%
“…ColBERT v2 (Santhanam et al, 2021) is a multi-vector method that represents the query and the documents as a set of vectors, and employs a multi-step retrieval procedure to obtain relevant documents. Splade v2 (Formal et al, 2021) represents queries and documents as sparse vectors of size equivalent to the vocabulary of the BERT encoder (Devlin et al, 2019). Our cpt-text models compute only one dense embedding per document which are indexed offline and does not depend on any cross-attention re-ranker at query time.…”
Section: Beir Searchmentioning
confidence: 99%
See 1 more Smart Citation
“…The final representation of the input sequence is then obtained by conducting a pooling operation on the set of |V |-dimensional representations with N elements. For instance, in SPLADEv2 [Formal et al, 2021a], the j-th weight is defined as:…”
Section: Preliminariesmentioning
confidence: 99%
“…Our improved model, which we call SPLADE-mask, is competitive with state-of-the-art lexical retrieval models that incorporate more complex multi-stage training regimes [Mallia et al, 2021, Lin and Ma, 2021, Zhuang and Zuccon, 2021, hard negative mining, or knowledge distillation using pretrained cross-encoders [Formal et al, 2021a].…”
Section: Introductionmentioning
confidence: 99%