Multi-Vector Attention Models for Deep Re-ranking

Zhou, Giulio; Devlin, Jacob

doi:10.18653/v1/2021.emnlp-main.443

Cited by 10 publications

(4 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unlike learned dense representations, our vocabulary-based representations may have more limited representational power. Recent work demonstrate that even in the case of learned dense representations, multiple representations can improve model performance (Lee et al, 2023;Zhou and Devlin, 2021). This work also does not evaluate the upper-bound on such vocabulary-based representations.…”

Section: Limitationsmentioning

confidence: 99%

NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders

Soares,

Gillick,

Cole

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Neural document rerankers are extremely effective in terms of accuracy. However, the best models require dedicated hardware for serving, which is costly and often not feasible. To avoid this serving-time requirement, we present a method of capturing up to 86% of the gains of a Transformer cross-attention model with a lexicalized scoring function that only requires 10 −6 % of the Transformer's FLOPs per document and can be served using commodity CPUs. When combined with a BM25 retriever, this approach matches the quality of a state-ofthe art dual encoder retriever, that still requires an accelerator for query encoding. We introduce NAIL (Non-Autoregressive Indexing with Language models) as a model architecture that is compatible with recent encoder-decoder and decoder-only large language models, such as T5, GPT-3 and PaLM. This model architecture can leverage existing pre-trained checkpoints and can be fine-tuned for efficiently constructing document representations that do not require neural processing of queries.

show abstract

Section: Limitationsmentioning

confidence: 99%

NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders

Soares,

Gillick,

Cole

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…On the representational side, we focus on reducing the storage cost using residual compression, achieving strong gains in reducing footprint while largely preserving quality. Nonetheless, we have not exhausted the space of more sophisticated optimizations possible, and we would expect more sophisticated forms of residual compression and composing our approach with dropping tokens (Zhou and Devlin, 2021) to open up possibilities for further reductions in space footprint.…”

Section: Research Limitationsmentioning

confidence: 99%

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Santhanam¹,

Khattab²,

Saad-Falcon³

et al. 2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

102

View full text Add to dashboard Cite

Neural information retrieval (IR) has greatly advanced search and other knowledgeintensive language tasks. While many neural IR methods encode queries and documents into single-vector representations, late interaction models produce multi-vector representations at the granularity of each token and decompose relevance modeling into scalable token-level computations. This decomposition has been shown to make late interaction more effective, but it inflates the space footprint of these models by an order of magnitude. In this work, we introduce ColBERTv2, a retriever that couples an aggressive residual compression mechanism with a denoised supervision strategy to simultaneously improve the quality and space footprint of late interaction. We evaluate ColBERTv2 across a wide range of benchmarks, establishing state-of-the-art quality within and outside the training domain while reducing the space footprint of late interaction models by 6-10×.

show abstract

“…Our dynamic approach to reduce the number of vectors needed to represent a passage differs from previous works that focus on fixed numbers of vectors across all passages: Lassance et al [25] prune ColBERT representations to either 50 or 10 vectors for MSMARCO by sorting tokens either by Inverse Document Frequency (IDF) or the last-layer attention scores of BERT. Zhou and Devlin [59] extend ColBERT with temporal pooling, by sliding a window over the passage representations to create a representation vector every window size steps, with a fixed target count of representation vectors. Luan et al [35] represent each passage with a fixed number of contextualized embeddings of the CLS token and the first 𝑚 token of the passage and score the relevance of the passage with the maximum score of the embeddings.…”

Section: Related Workmentioning

confidence: 99%

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Hofstätter¹,

Khattab²,

Althammer³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent progress in neural information retrieval has demonstrated large gains in effectiveness, while often sacrificing the efficiency and interpretability of the neural model compared to classical approaches. This paper proposes ColBERTer, a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction. Along the effectiveness Pareto frontier, ColBERTer's reductions dramatically lower ColBERT's storage requirements while simultaneously improving the interpretability of its token-matching scores. To this end, ColBERTer fuses single-vector retrieval, multivector refinement, and optional lexical matching components into one model. For its multi-vector component, ColBERTer reduces the number of stored vectors per document by learning unique whole-word representations for the terms in each document and learning to identify and remove word representations that are not essential to effective scoring. We employ an explicit multi-task, multi-stage training to facilitate using very small vector dimensions. Results on the MS MARCO and TREC-DL collection show that ColBERTer can reduce the storage footprint by up to 2.5×, while maintaining effectiveness. With just one dimension per token in its smallest setting, ColBERTer achieves index storage parity with the plaintext size, with very strong effectiveness results. Finally, we demonstrate ColBERTer's robustness on seven high-quality outof-domain collections, yielding statistically significant gains over traditional retrieval baselines.does doxycycline contain sulfa BERT tokenized (9 subword-tokens): 'does', 'do', '##xy', '##cy', '##cl', '##ine', 'contain', 'sul', '##fa'ColBERTer BOW 2 (30 saved vectors from 84 subword-tokens): photosensitivity doxycycline 12.9 sulfa 14.2 sunburned rash clothing sunlight allergic compound drugs containing 6.6 take safely wear . is no 4.7 exposed ... Fulltext: No doxycycline is not a sulfa containing compound, so you may take it safely if you are allergic to sulfa drugs. You should be aware, however, that doxycycline may cause photosensitivity, so you should wear appropriate clothing, or you may get easily sunburned or develop a rash if you are exposed to sunlight.

show abstract

Multi-Vector Attention Models for Deep Re-ranking

Cited by 10 publications

References 5 publications

NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders

NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Contact Info

Product

Resources

About