2013
DOI: 10.1007/s11263-013-0636-x
|View full text |Cite
|
Sign up to set email alerts
|

Image Classification with the Fisher Vector: Theory and Practice

Abstract: A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words (BoV) representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
1,111
3
3

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 1,426 publications
(1,122 citation statements)
references
References 57 publications
5
1,111
3
3
Order By: Relevance
“…This improves on the 38 % highest accuracy reported in Xiao et al, which uses these 15 features combined without attributes. Scene classification with attributes falls short of the more recent features suggested by Sanchez et al which acheive 47 % average accuracy (Sanchez et al 2013). The performances of scene classifiers trained on each low-level feature and attributes separately are shown in Fig.…”
Section: Attributes As Features For Scene Classificationmentioning
confidence: 91%
“…This improves on the 38 % highest accuracy reported in Xiao et al, which uses these 15 features combined without attributes. Scene classification with attributes falls short of the more recent features suggested by Sanchez et al which acheive 47 % average accuracy (Sanchez et al 2013). The performances of scene classifiers trained on each low-level feature and attributes separately are shown in Fig.…”
Section: Attributes As Features For Scene Classificationmentioning
confidence: 91%
“…Please cf . [29] for more details regarding the construction of FV representations. When computing the attribute representation, we use levels 2, 3, and 4, as well as 75 common bigrams at level 2, leading to 384 dimensions.…”
Section: Methodsmentioning
confidence: 99%
“…In particular, we adopt the Fisher vector (FV) [29] representation computed over SIFT descriptors extracted densely from the word image. The Fisher vector can be understood as a bag of words that also encodes higher order statistics, and has been shown to be a state-of-the-art encoding method for several computer vision tasks such as image classification and retrieval [3].…”
Section: Introductionmentioning
confidence: 99%
“…Compared with other sophisticated encoding algorithms, e.g., the IFK, the advantages of the BoW model lie in its theoretic simplicity and computational efficiency. It has been shown that the BoW is a special case of the Fisher kernel where the gradient computation is restricted to the mixture weight parameters of the GMM [71]. The BoW model with a hard assignment can be formulated in a match kernel framework with a linear kernel, which has been illustrated in [72].…”
Section: The Bow Modelmentioning
confidence: 99%