2021
DOI: 10.1007/978-3-030-72240-1_23
|View full text |Cite
|
Sign up to set email alerts
|

A White Box Analysis of ColBERT

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 23 publications
(17 citation statements)
references
References 8 publications
1
16
0
Order By: Relevance
“…Indeed, Formal et al [12] showed that the dot product 𝜙 𝑇 𝑞 𝑖 𝜙 𝑑 𝑗 used by ColBERT implicitly encapsulates token importance, by giving higher scores to tokens that have higher IDF values.…”
Section: Multi Representation Dense Retrievalmentioning
confidence: 99%
“…Indeed, Formal et al [12] showed that the dot product 𝜙 𝑇 𝑞 𝑖 𝜙 𝑑 𝑗 used by ColBERT implicitly encapsulates token importance, by giving higher scores to tokens that have higher IDF values.…”
Section: Multi Representation Dense Retrievalmentioning
confidence: 99%
“…Previous works scrutinizing BERT-based ranking models either relied on axiomatic approaches adapted to neural models [1,17], controlled experiments [11], or direct investigation of the learned representations [9,7] or attention [19]. This line of work has shown -among other findings -that these models, which rely on contextualized semantic matching, are actually still quite sensitive to lexical match and term statistics in documents/collections [9,7]. However, these observations are based on specifically tailored approaches that cannot directly be applied to any given model.…”
Section: Introductionmentioning
confidence: 99%
“…Similarly, we postulate that the most important query tokens 3 are more likely to bring relevant documents than non-relevant documents, and therefore we propose to prune (remove) the unimportant query embeddings. Indeed, Formal et al [3] noted that exact matches and the more important terms contribute more to the overall ColBERT scores; we argue that these terms are those that should be the focus of the ANN search.…”
Section: Query Embedding Pruningmentioning
confidence: 62%