2022
DOI: 10.48550/arxiv.2205.04733
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective

Abstract: Neural retrievers based on dense representations combined with Approximate Nearest Neighbors search have recently received a lot of attention, owing their success to distillation and/or better sampling of examples for training -while still relying on the same backbone architecture. In the meantime, sparse representation learning fueled by traditional inverted indexing techniques has seen a growing interest, inheriting from desirable IR priors such as explicit lexical matching. While some architectural variants… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(10 citation statements)
references
References 25 publications
0
10
0
Order By: Relevance
“…On the other hand, lexicon-based retrieval models, a.k.a. neural weighting schemes in sparse retrieval methods, are proposed to exploit intrinsic properties of natural language for sparse retrieval [43,42,13,12,14]. Recently, built upon causal language models (CLM) [46,47], [42] proposes to leverage the concurrence between a document and a query for lexicon-based sparse representation expansion.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…On the other hand, lexicon-based retrieval models, a.k.a. neural weighting schemes in sparse retrieval methods, are proposed to exploit intrinsic properties of natural language for sparse retrieval [43,42,13,12,14]. Recently, built upon causal language models (CLM) [46,47], [42] proposes to leverage the concurrence between a document and a query for lexicon-based sparse representation expansion.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, we present a uni-retrieval scheme for fast yet effective large-scale retrieval. Instead of adding their scores [28,14] from twice-retrieval with heavy overheads, we pipelinelize the retrieval procedure: given q, our lexicon-based retrieval under an inverted file system is to retrieve top-K documents from D. Then, our dense-vector retrieval is then applied to the constrained candidates for dense scores. The final retrieval results are according to simple addition of the two scores.…”
Section: Dual-consistency Learning For Unitermentioning
confidence: 99%
See 3 more Smart Citations