Image Classification with the Fisher Vector: Theory and Practice

Sánchez, Jorge M. Balestena; Perronnin, Florent; Mensink, Thomas; Verbeek, Jakob

doi:10.1007/s11263-013-0636-x

Cited by 1,426 publications

(1,122 citation statements)

References 57 publications

Supporting

Mentioning

1,111

Contrasting

Unclassified

Order By: Relevance

“…This improves on the 38 % highest accuracy reported in Xiao et al, which uses these 15 features combined without attributes. Scene classification with attributes falls short of the more recent features suggested by Sanchez et al which acheive 47 % average accuracy (Sanchez et al 2013). The performances of scene classifiers trained on each low-level feature and attributes separately are shown in Fig.…”

Section: Attributes As Features For Scene Classificationmentioning

confidence: 91%

The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding

et al. 2014

View full text Add to dashboard Cite

In this paper we present the first large-scale scene attribute database. First, we perform crowdsourced human studies to find a taxonomy of 102 discriminative attributes. We discover attributes related to materials, surface properties, lighting, affordances, and spatial layout. Next, we build the "SUN attribute database" on top of the diverse SUN categorical database. We use crowdsourcing to annotate attributes for 14,340 images from 707 scene categories. We perform numerous experiments to study the interplay between scene attributes and scene categories. We train and evaluate attribute classifiers and then study the feasibility of attributes as an intermediate scene representation for scene classification, zero shot learning, automatic image captioning, semantic image search, and parsing natural images. We show that when used as features for these tasks, low dimensional scene attributes can compete with or improve on the state of the art performance. The experiments suggest that scene attributes are an effective low-dimensional feature for capturing high-level context and semantics in scenes.

show abstract

Section: Attributes As Features For Scene Classificationmentioning

confidence: 91%

The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding

et al. 2014

View full text Add to dashboard Cite

show abstract

“…Please cf . [29] for more details regarding the construction of FV representations. When computing the attribute representation, we use levels 2, 3, and 4, as well as 75 common bigrams at level 2, leading to 384 dimensions.…”

Section: Methodsmentioning

confidence: 99%

“…In particular, we adopt the Fisher vector (FV) [29] representation computed over SIFT descriptors extracted densely from the word image. The Fisher vector can be understood as a bag of words that also encodes higher order statistics, and has been shown to be a state-of-the-art encoding method for several computer vision tasks such as image classification and retrieval [3].…”

Section: Introductionmentioning

confidence: 99%

Handwritten Word Spotting with Corrected Attributes

Almazán

Gordo

Fornés

et al. 2013

2013 IEEE International Conference on Computer Vision

View full text Add to dashboard Cite

We propose an approach to multi-writer word spotting, where the goal is to find a query word in a dataset comprised of document images. We propose an attributes-based approach that leads to a low-dimensional, fixed-length representation of the word images that is fast to compute and, especially, fast to compare. This approach naturally leads to an unified representation of word images and strings, which seamlessly allows one to indistinctly perform queryby-example, where the query is an image, and query-bystring, where the query is a string. We also propose a calibration scheme to correct the attributes scores based on Canonical Correlation Analysis that greatly improves the results on a challenging dataset. We test our approach on two public datasets showing state-of-the-art results.

show abstract

“…Compared with other sophisticated encoding algorithms, e.g., the IFK, the advantages of the BoW model lie in its theoretic simplicity and computational efficiency. It has been shown that the BoW is a special case of the Fisher kernel where the gradient computation is restricted to the mixture weight parameters of the GMM [71]. The BoW model with a hard assignment can be formulated in a match kernel framework with a linear kernel, which has been illustrated in [72].…”

Section: The Bow Modelmentioning

confidence: 99%

Action recognition via spatio-temporal local features: A comprehensive study

Zhen

Shao

2016

Image and Vision Computing

View full text Add to dashboard Cite

Local methods based on spatio-temporal interest points (STIPs) have shown their effectiveness for human action recognition. The bag-of-words (BoW) model has been widely used and dominated in this field. Recently, a large number of techniques based on local features including improved variants of the BoW model, sparse coding (SC), Fisher kernels (FK), vector of locally aggregated descriptors (VLAD) as well as the naive Bayes nearest neighbor (NBNN) classifier have been proposed and developed for visual recognition. However, some of them are proposed in the image domain and have not yet been applied to the video domain and it is still unclear how effectively these techniques would perform on action recognition. In this paper, we provide a comprehensive study on these local methods for human action recognition. We implement these techniques and conduct comparison under unified experimental settings on three widely used benchmarks, i.e., the KTH, UCF-YouTube and HMDB51 datasets. We discuss insightfully the findings from the experimental results and draw useful conclusions, which are expected to guide practical applications and future work for the action recognition community.

show abstract

Image Classification with the Fisher Vector: Theory and Practice

Cited by 1,426 publications

References 57 publications

The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding

The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding

Handwritten Word Spotting with Corrected Attributes

Action recognition via spatio-temporal local features: A comprehensive study

Contact Info

Product

Resources

About