Aggregating local descriptors into a compact image representation

Jeǵou, Hervé; Douze, Matthijs; Schmid, Cordelia; Pérez, Patrick

doi:10.1109/cvpr.2010.5540039

Cited by 2,322 publications

(1,731 citation statements)

References 26 publications

Supporting

Mentioning

1,716

Contrasting

Unclassified

Order By: Relevance

“…The existing frameworks do not use two descriptors that are extracted simultaneously for different purposes. Our framework is as robust as typical frameworks that use SIFT [1], [7], [8], and it can reduce both the time needed to extract local descriptors and storage size as well as an approach that uses binary descriptors [6].…”

Section: Introductionmentioning

confidence: 96%

“…The concatenated vector should be normalized at each component instead of using general L2 normalization. Intra L2 and power-law normalization methods are often used in order to normalize the concatenated vector [1], [20].…”

Section: Our Image Retrieval Frameworkmentioning

confidence: 99%

“…Descriptor aggregation techniques such as the Fisher vector and vector of locally aggregated descriptors (VLAD) are used for instance searches or object-based image retrieval because of their robustness and runtime efficiency [1]. The frameworks are composed of four stages, i.e., local descriptor extraction, descriptor aggregation, search, and geometric verification.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Image Retrieval Framework Based on Dual Representation Descriptor

Yoshida

Toyofuku

2017

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYDescriptor aggregation techniques such as the Fisher vector and vector of locally aggregated descriptors (VLAD) are used in most image retrieval frameworks. It takes some time to extract local descriptors, and the geometric verification requires storage if a real-valued descriptor such as SIFT is used. Moreover, if we apply binary descriptors to such a framework, the performance of image retrieval is not better than if we use a real-valued descriptor. Our approach tackles these issues by using a dual representation descriptor that has advantages of being both a real-valued and a binary descriptor. The real value of the dual representation descriptor is aggregated into a VLAD in order to achieve high accuracy in the image retrieval, and the binary one is used to find correspondences in the geometric verification stage in order to reduce the amount of storage needed. We implemented a dual representation descriptor extracted in semi-real time by using the CARD descriptor. We evaluated the accuracy of our image retrieval framework including the geometric verification on three datasets (holidays, ukbench and Stanford mobile visual search). The results indicate that our framework is as accurate as the framework that uses SIFT. In addition, the experiments show that the image retrieval speed and storage requirements of our framework are as efficient as those of a framework that uses ORB. key words: image retrieval, local descriptor

show abstract

Section: Introductionmentioning

confidence: 96%

Section: Our Image Retrieval Frameworkmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Image Retrieval Framework Based on Dual Representation Descriptor

Yoshida

Toyofuku

2017

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…In a very recent work [34], Perronnin et al further improved the Fisher kernel in several ways. Inspired by these works, [20] Jegou et al proposed a simpler approach in which a K-Means algorithm and a new descriptor aggregation approximate the universal GMM. This last method is a good approximation of Perronnin et al sophisticated GMM models and produces comparable results.…”

Section: Introductionmentioning

confidence: 99%

“…Then, a kernel function, RBF or Fisher kernel, is considered. Such strategies proved to be very powerful in a batch learning context, when a large training set is available [20,33,34]. In [32], Perronnin first proposed to build two kinds of visual dictionaries per category: one "universal" visual dictionary is build on the whole database with a Gaussian mixture model (GMM) without any supervision, while the other one dedicated to the category uses labelled training data to learn the GMM of that category.…”

Section: Introductionmentioning

confidence: 99%

Incremental kernel learning for active image retrieval without global dictionaries

Gosselin

Precioso

Philipp‐Foliguet

2011

Pattern Recognition

View full text Add to dashboard Cite

In content-based image retrieval context, a classic strategy consists in computing off-line a dictionary of visual features. This visual dictionary is then used to provide a new representation of the data which should ease any task of classification or retrieval. This strategy, based on past research works in text retrieval, is suitable for the context of batch learning, when a large training set can be built either by using a strong prior knowledge of data semantics (like for textual data) or with an expensive off-line pre-computation. Such an approach has major drawbacks in the context of interactive retrieval, where the user iteratively builds the training data set in a semi-supervised approach by providing positive and negative annotations to the system in the relevance feedback loop. The training set is thus built for each retrieval session without any prior knowledge about the concepts of interest for this session. We propose a completely different approach to build the dictionary on-line from features extracted in relevant images. We design the corresponding kernel function, which is learnt during the retrieval session. For each new label, the kernel function is updated with a complexity linear with respect to the size of the database. We propose an efficient active learning strategy for the weakly supervised retrieval method developed in this paper. Moreover this framework allows the combination of features of different types. Experiments are carried out on standard databases, and show that a small dictionary can be dynamically extracted from the features with better performances than a global one.

show abstract

Concept‐Based and Event‐Based Video Search in Large Video Collections

Markatopoulou¹,

Galanopoulos²,

Tzelepis³

et al. 2019

Big Data Analytics for Large‐Scale Multimedia Search

View full text Add to dashboard Cite

for each event class, the research community has directed its efforts towards effectively combining textual and visual analysis techniques, such as using text analysis techniques, exploiting large sets of DCNN-based concept detectors and using various re-ranking methods, such as pseudo-relevance feedback, or self-paced re-ranking. In this chapter, we survey the literature and we present our research efforts towards improving concept-and event-based video search. For concept-based video search, we focus on i) feature extraction using hand-crafted and DCNN-based descriptors, ii) dimensionality reduction using accelerated generalised subclass discriminant analysis (AGSDA), iii) cascades of hand-crafted and DCNN-based descriptors, iv) multi-task learning (MTL) to exploit model sharing and v) stacking architectures to exploit concept relations. For video event detection, we focus on methods which exploit positive examples, when available, again using DCNN-based features and AGSDA, and we also develop a framework for zero-example event detection that associates the textual description of an event class with the available visual concepts in order to identify the most relevant concepts regarding the event class. Additionally, we present a pseudorelevant feedback mechanism that relies on AGSDA.

show abstract

Aggregating local descriptors into a compact image representation

Cited by 2,322 publications

References 26 publications

Image Retrieval Framework Based on Dual Representation Descriptor

Image Retrieval Framework Based on Dual Representation Descriptor

Incremental kernel learning for active image retrieval without global dictionaries

Concept‐Based and Event‐Based Video Search in Large Video Collections

Contact Info

Product

Resources

About