Proceedings of the British Machine Vision Conference 2014 2014
DOI: 10.5244/c.28.6
|View full text |Cite
|
Sign up to set email alerts
|

Return of the Devil in the Details: Delving Deep into Convolutional Nets

Abstract: Abstract-The latest generation of Convolutional Neural Networks (CNN) have achieved impressive results in challenging benchmarks on image recognition and object detection, significantly raising the interest of the community in these methods. Nevertheless, it is still unclear how different CNN methods compare with each other and with previous state-of-the-art shallow representations such as the Bag-of-Visual-Words and the Improved Fisher Vector. This paper conducts a rigorous evaluation of these new techniques,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

20
1,954
5
15

Year Published

2015
2015
2022
2022

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 2,500 publications
(1,994 citation statements)
references
References 27 publications
20
1,954
5
15
Order By: Relevance
“…Popular image representations are the Visual Bag-Of-Words Model [19,26,29], Fisher Vector [35] and its improved version [1,36]. However, as shown recently in [11,24], neural network based ) and share the same polarity models have been shown to widely outperform these previous models. So, to fit with the CBOW representation discussed in the previous section, we choose to exploit the images by using a representation similar to the one used for the textual information, i.e.…”
Section: Textual and Visual Informationmentioning
confidence: 99%
“…Popular image representations are the Visual Bag-Of-Words Model [19,26,29], Fisher Vector [35] and its improved version [1,36]. However, as shown recently in [11,24], neural network based ) and share the same polarity models have been shown to widely outperform these previous models. So, to fit with the CBOW representation discussed in the previous section, we choose to exploit the images by using a representation similar to the one used for the textual information, i.e.…”
Section: Textual and Visual Informationmentioning
confidence: 99%
“…A comprehensive evaluation further demonstrates the advantages of deep CNNs features with respect to shallow handcrafted feature for image classification [22]. The advantage is that researchers will be free from mastering the knowledge of specific domains and the CNNs architecture can be used for many different domains, especially in visual systems with minor changes.…”
Section: Deep Convolutional Neural Network Based Featurementioning
confidence: 99%
“…CNN has improved dramatically over the last few years, and many new powerful pre-trained networks are currently available. We compared three different features extracted from GoogLeNet (Szegedy et al, 2014), VGG 16 layers (Chatfield et al, 2014), and CaffeNet (Jia et al, 2014;Krizhevsky et al, 2012). Additionally, we tested the Fisher Vector (Perronnin et al, 2010), which was the standard hand-crafted image feature before deep learning.…”
Section: Effect Of Image Featuresmentioning
confidence: 99%