Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS) 2020
DOI: 10.18653/v1/2020.nlposs-1.6
|View full text |Cite
|
Sign up to set email alerts
|

Flexible retrieval with NMSLIB and FlexNeuART

Abstract: Our objective is to introduce to the NLP community an existing k-NN search library NMSLIB, a new retrieval toolkit FlexNeuART, as well as their integration capabilities.NMSLIB, while being one the fastest k-NN search libraries, is quite generic and supports a variety of distance/similarity functions.Because the library relies on the distance-based structure-agnostic algorithms, it can be further extended by adding new distances. FlexNeuART is a modular, extendible and flexible toolkit for candidate generation … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
20
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(21 citation statements)
references
References 52 publications
1
20
0
Order By: Relevance
“…Long MS MARCO doc documents are truncated to 445 first BERT tokens, but such shortening leads to only small (≈ 1%) loss in accuracy [4]. Experiments are carried out using a retrieval toolkit FlexNeuART [5]. We measure effectiveness using the mean reciprocal rank (MRR), which is an official metric for MS MARCO data [7].…”
Section: Methodsmentioning
confidence: 99%
“…Long MS MARCO doc documents are truncated to 445 first BERT tokens, but such shortening leads to only small (≈ 1%) loss in accuracy [4]. Experiments are carried out using a retrieval toolkit FlexNeuART [5]. We measure effectiveness using the mean reciprocal rank (MRR), which is an official metric for MS MARCO data [7].…”
Section: Methodsmentioning
confidence: 99%
“…Translation-based features Capturing semantic relationships between a query and a document is also crucial to improving retrieval accuracy. To incorporate such features, we can use a translation model (Boytsov and Nyberg, 2020;Boytsov and Kolter, 2021) to measure the log translation probability between queries and documents. The conditional probability we need p(q|d n ) is generated by the IBM Model 1 translation model, and the final query-document feature is the sum of all individual conditional query probabilities.…”
Section: Learning-to-rank Featuresmentioning
confidence: 99%
“…These algorithms are also widely used in the industry at scale. However, all known graph indices are static and do not support updates, especially delete requests [18], possibly due to the fact that simple graph modification rules for insertions and deletions do not retain the same graph quality over a stream of insertions and deletions.…”
Section: Shortcoming Of Existing Algorithmsmentioning
confidence: 99%
“…As a result, the current practice in industry is to periodically re-build such indices from scratch [18] to manifest recent changes to the underlying dataset. However, this is a very expensive operation.…”
Section: Shortcoming Of Existing Algorithmsmentioning
confidence: 99%
See 1 more Smart Citation