2019
DOI: 10.1109/tkde.2018.2828095
|View full text |Cite
|
Sign up to set email alerts
|

Fast Cosine Similarity Search in Binary Space with Angular Multi-Index Hashing

Abstract: Given a large dataset of binary codes and a binary query point, we address how to efficiently find K codes in the dataset that yield the largest cosine similarities to the query. The straightforward answer to this problem is to compare the query with all items in the dataset, but this is practical only for small datasets. One potential solution to enhance the search time and achieve sublinear cost is to use a hash table populated with binary codes of the dataset and then look up the nearby buckets to the query… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(14 citation statements)
references
References 39 publications
0
14
0
Order By: Relevance
“…Even [8] showing cosine similarity measure has an accuracy of up to 0.99 in measuring semantic vectors that have been reduced to 8 bits. Moreover [9] also shows that CSM faster than the linear scan and approximation methods.…”
Section: Introductionmentioning
confidence: 91%
“…Even [8] showing cosine similarity measure has an accuracy of up to 0.99 in measuring semantic vectors that have been reduced to 8 bits. Moreover [9] also shows that CSM faster than the linear scan and approximation methods.…”
Section: Introductionmentioning
confidence: 91%
“…Cosine similarity [21] uses the cosine of the angles of two vectors in the vector space to measure the similarity between them. The closer the cosine value is to 1, the closer the angle is to 0, the more similar they are.…”
Section: Related Workmentioning
confidence: 99%
“…For the local search, we again use alternative optimization technique. Given {b ij } j =t fixed, b it is updated by exhaustively checking all codewords of C j and finding the element that minimizes the objective function in (13). For the perturbation procedure of SLS, we randomly choose k codes by sampling from the uniform distribution U (1, m).…”
Section: Optimizationmentioning
confidence: 99%
“…Binary-valued representation has several advantages, such as being compact to store and faster to compare, making it a suitable fit for large-scale nearest neighbor search. Moreover, for binary strings, one can achieve sublinear query time using hash tables [13,34] or tree-based indexing data structures [11,12]. Finding compact binary codes that better respect the given notion of similarity has been the topic of much work over the last two decades during which a rich set of hashing techniques has been proposed.…”
Section: Introductionmentioning
confidence: 99%