2017
DOI: 10.1109/tpami.2016.2613873
|View full text |Cite
|
Sign up to set email alerts
|

Improving Large-Scale Image Retrieval Through Robust Aggregation of Local Descriptors

Abstract: Abstract-Visual search and image retrieval underpin numerous applications, however the task is still challenging predominantly due to the variability of object appearance and ever increasing size of the databases, often exceeding billions of images. Prior art methods rely on aggregation of local scale-invariant descriptors, such as SIFT, via mechanisms including Bag of Visual Words (BoW), Vector of Locally Aggregated Descriptors (VLAD) and Fisher Vectors (FV). However, their performance is still short of what … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
40
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 64 publications
(40 citation statements)
references
References 29 publications
0
40
0
Order By: Relevance
“…Concurrent to [54], Delhumeau et al [55] propose to normalize each residual vector instead of the residual sums; they also advocate for local PCA within each Voronoi cell which does not perform dimension reduction as [52]. A recent work [56] employs soft assignment and empirically learns optimal weights for each rank to improve over the hard quantization.…”
Section: Encodingmentioning
confidence: 99%
See 1 more Smart Citation
“…Concurrent to [54], Delhumeau et al [55] propose to normalize each residual vector instead of the residual sums; they also advocate for local PCA within each Voronoi cell which does not perform dimension reduction as [52]. A recent work [56] employs soft assignment and empirically learns optimal weights for each rank to improve over the hard quantization.…”
Section: Encodingmentioning
confidence: 99%
“…From these results, we arrive at three observations. First, among the SIFT-based methods, those with mediumsized codebooks [13], [31], [19] usually lead to superior (or competitive) performance, while those based on small codebook (compact representations) [15], [18], [56] exhibit inferior accuracy. On the one hand, the visual words in the medium-sized codebooks lead to relatively high matching recall due to the large Voronoi cells.…”
Section: Accuracy Comparisonsmentioning
confidence: 99%
“…Virtually all aggregation schemes rely on clustering in feature space, with varying degree of sophistication: Bag-of-Words (BOW) [16], Vector of Locally Aggregated Descriptors (VLAD) [17], Fisher Vector (FV) [4], and Robust Visual Descriptor (RVD) [18]. BOW is effectively a fixed length histogram with descriptors assigned to the closest visual word; VLAD additionally encodes the positions of local descriptors within each voronoi region by computing their residuals; the Fisher Vector (FV) aggregates local descriptors using the Fisher Kernel framework (second order statistics), and RVD combines rank-based multi-assignment with robust accumulation to reduce the impact of outliers.…”
Section: A Methods Based On Hand-crafted Descriptorsmentioning
confidence: 99%
“…Kalantidis et al [22] extended this work by introducing cross-dimensional weighting in aggregation of CNN features. The retrieval performance is further improved when the RVD-W method is used for aggregation of CNNbased deep descriptors [18]. Tolias et al [2] proposed to extract Maximum Activations of Convolutions (MAC) descriptor from several multi-scale overlapping regions of the last convolutional layer feature map.…”
Section: B Methods Based On Cnn Descriptorsmentioning
confidence: 99%
See 1 more Smart Citation