Query Adaptive Similarity for Large Scale Object Retrieval

Qin, Danfeng; Wengert, Christian; Gool, Luc Van

doi:10.1109/cvpr.2013.211

Cited by 73 publications

(48 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While the average distance between a word to its neighbors is regularized to be almost constant in [92], the idea of democratizing the contribution of individual embeddings has later been employed in [18]. In [20], Tolias et al show that VLAD and HE share similar natures and propose a new match kernel which trades off between local feature aggregation and feature-to-feature matching, using a similar matching function to [91]. They also demonstrate that using more bits (e.g., 128) in HE is superior to the original 64 bits scheme at the cost of decreased efficiency.…”

Section: Hamming Embedding and Its Improvementsmentioning

confidence: 99%

“…It exploits the vector-to-hyperplane distance while retaining the efficiency of the inverted index. Further, Qin et al [91] design a higher-order match kernel within a probabilistic framework and adaptively normalize the local feature distances by the distance distribution of false matches. This method is in the spirit similar to [92], in which the word-word distance, instead of the feature-feature distance [91], is normalized, according to the neighborhood distribution of each visual word.…”

Section: Hamming Embedding and Its Improvementsmentioning

confidence: 99%

“…Further, Qin et al [91] design a higher-order match kernel within a probabilistic framework and adaptively normalize the local feature distances by the distance distribution of false matches. This method is in the spirit similar to [92], in which the word-word distance, instead of the feature-feature distance [91], is normalized, according to the neighborhood distribution of each visual word. While the average distance between a word to its neighbors is regularized to be almost constant in [92], the idea of democratizing the contribution of individual embeddings has later been employed in [18].…”

Section: Hamming Embedding and Its Improvementsmentioning

confidence: 99%

See 2 more Smart Citations

SIFT Meets CNN: A Decade Survey of Instance Retrieval

Zheng

Yang

Tian

2018

IEEE Trans. Pattern Anal. Mach. Intell.

656

319

View full text Add to dashboard Cite

Abstract-In the early days, content-based image retrieval (CBIR) was studied with global features. Since 2003, image retrieval based on local descriptors (de facto SIFT) has been extensively studied for over a decade due to the advantage of SIFT in dealing with image transformations. Recently, image representations based on the convolutional neural network (CNN) have attracted increasing interest in the community and demonstrated impressive performance. Given this time of rapid evolution, this article provides a comprehensive survey of instance retrieval over the last decade. Two broad categories, SIFT-based and CNN-based methods, are presented. For the former, according to the codebook size, we organize the literature into using large/medium-sized/small codebooks. For the latter, we discuss three lines of methods, i.e., using pre-trained or fine-tuned CNN models, and hybrid methods. The first two perform a single-pass of an image to the network, while the last category employs a patch-based feature extraction scheme. This survey presents milestones in modern instance retrieval, reviews a broad selection of previous works in different categories, and provides insights on the connection between SIFT and CNN-based methods. After analyzing and comparing retrieval performance of different categories on several datasets, we discuss promising directions towards generic and specialized instance retrieval.

show abstract

Section: Hamming Embedding and Its Improvementsmentioning

confidence: 99%

Section: Hamming Embedding and Its Improvementsmentioning

confidence: 99%

Section: Hamming Embedding and Its Improvementsmentioning

confidence: 99%

See 1 more Smart Citation

SIFT Meets CNN: A Decade Survey of Instance Retrieval

Zheng

Yang

Tian

2018

IEEE Trans. Pattern Anal. Mach. Intell.

656

319

View full text Add to dashboard Cite

show abstract

“…All relevant learning-based approaches fall into one or both of the following two categories: (i) learning for an auxiliary task (e.g. some form of distinctiveness of local features [4,15,30,35,58,59,90]), and (ii) learning on top of shallow hand-engineered descriptors that cannot be finetuned for the target task [2,9,24,35,57]. Both of these are in spirit opposite to the core idea behind deep learning that has provided a major boost in performance in various recognition tasks: end-to-end learning.…”

Section: Related Workmentioning

confidence: 99%

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition

Arandjelović

Gronát

Torii

et al. 2018

IEEE Trans. Pattern Anal. Mach. Intell.

464

170

View full text Add to dashboard Cite

We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph. We present the following three principal contributions. First, we develop a convolutional neural network (CNN) architecture that is trainable in an end-to-end manner directly for the place recognition task. The main component of this architecture, NetVLAD, is a new generalized VLAD layer, inspired by the "Vector of Locally Aggregated Descriptors" image representation commonly used in image retrieval. The layer is readily pluggable into any CNN architecture and amenable to training via backpropagation. Second, we develop a training procedure, based on a new weakly supervised ranking loss, to learn parameters of the architecture in an end-to-end manner from images depicting the same places over time downloaded from Google Street View Time Machine. Finally, we show that the proposed architecture significantly outperforms non-learnt image representations and off-the-shelf CNN descriptors on two challenging place recognition benchmarks, and improves over current stateof-the-art compact image representations on standard image retrieval benchmarks.

show abstract

“…Second, during matching verification, the Hamming distance between two binary features can be efficiently calculated via xor operations, while the Euclidean distance between floating-point vectors is very expensive to compute. Previous work of this line includes Hamming Embedding (HE) [1] and its variants [10], [11], which use binary SIFT features for verification. Meanwhile, binary features also include spatial context [12], heterogeneous feature such as color [13], etc.…”

mentioning

confidence: 99%

Coupled Binary Embedding for Large-Scale Image Retrieval

Zheng

Wang

Tian

2014

IEEE Trans. on Image Process.

134

View full text Add to dashboard Cite

Abstract-Visual matching is a crucial step in image retrieval based on the bag-of-words (BoW) model. In the baseline method, two keypoints are considered as a matching pair if their SIFT descriptors are quantized to the same visual word. However, the SIFT visual word has two limitations. First, it loses most of its discriminative power during quantization. Second, SIFT only describes the local texture feature. Both drawbacks impair the discriminative power of the BoW model and lead to false positive matches. To tackle this problem, this paper proposes to embed multiple binary features at indexing level. To model correlation between features, a multi-IDF scheme is introduced, through which different binary features are coupled into the inverted file. We show that matching verification methods based on binary features, such as Hamming embedding, can be effectively incorporated in our framework. As an extension, we explore the fusion of binary color feature into image retrieval. The joint integration of the SIFT visual word and binary features greatly enhances the precision of visual matching, reducing the impact of false positive matches. Our method is evaluated through extensive experiments on four benchmark datasets (Ukbench, Holidays, DupImage, and MIR Flickr 1M). We show that our method significantly improves the baseline approach. In addition, largescale experiments indicate that the proposed method requires acceptable memory usage and query time compared with other approaches. Further, when global color feature is integrated, our method yields competitive performance with the state-of-the-arts.Index Terms-Feature fusion, coupled binary embedding, multi-IDF, image retrieval.

show abstract

Query Adaptive Similarity for Large Scale Object Retrieval

Cited by 73 publications

References 17 publications

SIFT Meets CNN: A Decade Survey of Instance Retrieval

SIFT Meets CNN: A Decade Survey of Instance Retrieval

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition

Coupled Binary Embedding for Large-Scale Image Retrieval

Contact Info

Product

Resources

About