Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph

Fu, Cong; Xiang, Chao; Wang, Changxu

doi:10.48550/arxiv.1707.00143

Cited by 11 publications

(18 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the implementation of this paper, the overall empirical indexing complexity of the NSG is O(kn 16 with nn-descent and f (n) = n log n with Faiss), which is much lower than O(n 2 log n + cn 2 ) of the MRNG.…”

Section: Indexing Complexity Of Nsgmentioning

confidence: 97%

Fast approximate nearest neighbor search with the navigating spreading-out graph

2019

Self Cite

View full text Add to dashboard Cite

Approximate nearest neighbor search (ANNS) is a fundamental problem in databases and data mining. A scalable ANNS algorithm should be both memory efficient and fast. Some early graph-based approaches have shown attractive theoretical guarantees on search time complexity, but they all suffer from the problem of high indexing time complexity. Recently, some graph-based methods have been proposed to reduce indexing complexity by approximating the traditional graphs; these methods have achieved revolutionary performance on million-scale datasets. Yet, they still can not scale to billion-node databases. In this paper, to further improve the search-efficiency and scalability of graph-based methods, we start by introducing four aspects: (1) ensuring the connectivity of the graph; (2) lowering the average out-degree of the graph for fast traversal; (3) shortening the search path; and (4) reducing the index size. Then, we propose a novel graph structure called Monotonic Relative Neighborhood Graph (MRNG) which guarantees very low search complexity (close to logarithmic time). To further lower the indexing complexity and make it practical for billion-node ANNS problems, we propose a novel graph structure named Navigating Spreading-out Graph (NSG) by approximating the MRNG. The NSG takes the four aspects into account simultaneously. Extensive experiments show that NSG outperforms all the existing algorithms significantly. In addition, NSG shows superior performance in the E-commercial search scenario of Taobao (Alibaba Group) and has been integrated into their search engine at billionnode scale. 11: end while 12: return the first k nodes in S proximates the Relative Neighborhood Graphs (RNG) [37], and Hierarchical NSW (HNSW) [33] is proposed to take advantage of properties of the Delaunay Graph, the NSWN, and the RNG. Moreover, a hierarchical structure is used in HNSW to enable multi-scale hopping on different layers of the graph.These approximations are mainly based on intuition and generally lack rigorous theoretical support. In our experimental study, we find that they are still not powerful enough for billion-node applications, which are in great demand today. To further improve the search-efficiency and scalability of graph-based methods, we start with how ANNS is performed on a graph. Despite the diversity of graph indices, almost all graph-based methods [3,7,13,21,26,32] share the same greedy best-first search algorithm (given in Algorithm 1), we refer to it as the search-on-graph algorithm below.Algorithm 1 tries to reach the query-node by the following greedy process. For a given query q, we are required to retrieve its nearest neighbors from the dataset. Algorithm 1 tries to reach the query point with the following greedy process. For a given query q, we are required to retrieve its nearest neighbors from the dataset. Given a starting node p, we follow the out-edges to reach p's neighbors, and compare them with q to choose one to proceed. The choosing principle is to minimize the distance to q, and the new iterati...

show abstract

Section: Indexing Complexity Of Nsgmentioning

confidence: 97%

Fast approximate nearest neighbor search with the navigating spreading-out graph

2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…The real CBIR system has much larger size and much more complicated images. Our experimental results suggest a two-step approach for CBIR: 1) using complicated models (e.g., deep learning) to learn semantic real value features and 2) using advanced ANNS methods [18,5] to achieve fast retrieval.…”

Section: Partially Supervised Settingmentioning

confidence: 94%

A Revisit on Deep Hashings for Large-scale Content Based Image Retrieval

Gu¹,

Wang²

2017

Preprint

Self Cite

View full text Add to dashboard Cite

There is a growing trend in studying deep hashing methods for content-based image retrieval (CBIR), where hash functions and binary codes are learnt using deep convolutional neural networks and then the binary codes can be used to do approximate nearest neighbor (ANN) search. All the existing deep hashing papers report their methods' superior performance over the traditional hashing methods according to their experimental results. However, there are serious flaws in the evaluations of existing deep hashing papers: (1) The datasets they used are too small and simple to simulate the real CBIR situation. (2) They did not correctly include the search time in their evaluation criteria, while the search time is crucial in real CBIR systems.(3) The performance of some unsupervised hashing algorithms (e.g., LSH) can easily be boosted if one uses multiple hash tables, which is an important factor should be considered in the evaluation while most of the deep hashing papers failed to do so. We re-evaluate several state-of-the-art deep hashing methods with a carefully designed experimental setting. Empirical results reveal that the performance of these deep hashing methods are inferior to multi-table IsoH, a very simple unsupervised hashing method. Thus, the conclusions in all the deep hashing papers should be carefully re-examined.

show abstract

“…recommendation systems, search engines, remote sensing systems. Among all the methods proposed for this challenging task [16,17,27,38], hamming hash-based methods have achieved pronounced successes. It aims to learn a hash function mapping the images in the high-dimensional pixel space into lowdimensional hamming space while preserving their visual similarity in the original pixel space.…”

Section: Introductionmentioning

confidence: 99%

TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval

Chen¹,

Zhang²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep hamming hashing has gained growing popularity in approximate nearest neighbour search for large-scale image retrieval. Until now, the deep hashing for the image retrieval community has been dominated by convolutional neural network architectures, e.g. Resnet[21]. In this paper, inspired by the recent advancements of vision transformers, we present Transhash, a pure transformerbased framework for deep hashing learning. Concretely, our framework is composed of two major modules: (1) Based on Vision Transformer (ViT), we design a siamese vision transformer backbone for image feature extraction. To learn fine-grained features, we innovate a dual-stream feature learning on top of the transformer to learn discriminative global and local features. (2) Besides, we adopt a Bayesian learning scheme with a dynamically constructed similarity matrix to learn compact binary hash codes. The entire framework is jointly trained in an end-to-end manner. To the best of our knowledge, this is the first work to tackle deep hashing learning problems without convolutional neural networks (CNNs). We perform comprehensive experiments on three widely-studied datasets: CIFAR-10, NUSWIDE and IMAGENET. The experiments have evidenced our superiority against the existing state-of-the-art deep hashing methods. Specifically, we achieve 8.2%, 2.6%, 12.7% performance gains in terms of average mAP for different hash bit lengths on three public datasets, respectively.

show abstract

Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph

Cited by 11 publications

References 0 publications

Fast approximate nearest neighbor search with the navigating spreading-out graph

Fast approximate nearest neighbor search with the navigating spreading-out graph

A Revisit on Deep Hashings for Large-scale Content Based Image Retrieval

TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval

Contact Info

Product

Resources

About