Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.443
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Vector Attention Models for Deep Re-ranking

Abstract: Large-scale document retrieval systems often utilize two styles of neural network models which live at two different ends of the joint computation vs. accuracy spectrum. The first style is dual encoder (or two-tower) models, where the query and document representations are computed completely independently and combined with a simple dot product operation. The second style is cross-attention models, where the query and document features are concatenated in the input layer and all computation is based on the joi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 5 publications
0
3
0
Order By: Relevance
“…Unlike learned dense representations, our vocabulary-based representations may have more limited representational power. Recent work demonstrate that even in the case of learned dense representations, multiple representations can improve model performance (Lee et al, 2023;Zhou and Devlin, 2021). This work also does not evaluate the upper-bound on such vocabulary-based representations.…”
Section: Limitationsmentioning
confidence: 99%
“…Unlike learned dense representations, our vocabulary-based representations may have more limited representational power. Recent work demonstrate that even in the case of learned dense representations, multiple representations can improve model performance (Lee et al, 2023;Zhou and Devlin, 2021). This work also does not evaluate the upper-bound on such vocabulary-based representations.…”
Section: Limitationsmentioning
confidence: 99%
“…On the representational side, we focus on reducing the storage cost using residual compression, achieving strong gains in reducing footprint while largely preserving quality. Nonetheless, we have not exhausted the space of more sophisticated optimizations possible, and we would expect more sophisticated forms of residual compression and composing our approach with dropping tokens (Zhou and Devlin, 2021) to open up possibilities for further reductions in space footprint.…”
Section: Research Limitationsmentioning
confidence: 99%
“…Our dynamic approach to reduce the number of vectors needed to represent a passage differs from previous works that focus on fixed numbers of vectors across all passages: Lassance et al [25] prune ColBERT representations to either 50 or 10 vectors for MSMARCO by sorting tokens either by Inverse Document Frequency (IDF) or the last-layer attention scores of BERT. Zhou and Devlin [59] extend ColBERT with temporal pooling, by sliding a window over the passage representations to create a representation vector every window size steps, with a fixed target count of representation vectors. Luan et al [35] represent each passage with a fixed number of contextualized embeddings of the CLS token and the first 𝑚 token of the passage and score the relevance of the passage with the maximum score of the embeddings.…”
Section: Related Workmentioning
confidence: 99%