2017
DOI: 10.1109/tip.2017.2688133
|View full text |Cite
|
Sign up to set email alerts
|

Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval

Abstract: Deep convolutional neural network models pre-trained for the ImageNet classification task have been successfully adopted to tasks in other domains, such as texture description and object proposal generation, but these tasks require annotations for images in the new domain. In this paper, we focus on a novel and challenging task in the pure unsupervised setting: fine-grained image retrieval. Even with image labels, fine-grained images are difficult to classify, letting alone the unsupervised retrieval task. We … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
238
0
2

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 412 publications
(240 citation statements)
references
References 37 publications
0
238
0
2
Order By: Relevance
“…There exist two kinds of attention approaches: weighting [3,10] and selection [8,28]. Weighting approaches create attention by emphasizing convolutional activations of relevant information or by reducing activation of irrelevant information via multiplying weights.…”
Section: Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation
“…There exist two kinds of attention approaches: weighting [3,10] and selection [8,28]. Weighting approaches create attention by emphasizing convolutional activations of relevant information or by reducing activation of irrelevant information via multiplying weights.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Selection approaches direct attention to import information by selecting convolutional features; and the process is equivalent to applying a binary weight spatially in the case of using global average pooling or maximum activation pooling as aggregation techniques. For example, Wei et al [28] selected local features on the largest activated connected component of a convolutional layer. Hoang et al [8] select deep convolutional local features via masks (i.e.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Zheng et al [24] group the convolutional channels to localize object parts in the well constrained spatial configurations. Wei et al [25] use a simple thresholding method to discover object parts and select the largest component to represent the desired foreground object. In contrast, we formulate the discovery procedure for scene recognition, where more complex semantic regions and unconstrained spatial structures exist.…”
Section: B Discriminative Region Discoverymentioning
confidence: 99%
“…Most fine-grained classification systems employ visual features of images to classify objects using a CNN [25][26][27][28][29], and subordinate classes from various domains such as flowers, birds, dogs, aircrafts, and cars can be successfully recognized using these approaches. The objects are visually similar to each other, and can only be discriminated through subtle details.…”
Section: Fine-grained Classificationmentioning
confidence: 99%
“…The objects are visually similar to each other, and can only be discriminated through subtle details. Most fine-grained classification systems employ visual features of images to classify objects using a CNN [25][26][27][28][29], and subordinate classes from various domains such as flowers, birds, dogs, aircrafts, and cars can be successfully recognized using these approaches. To improve the classification performance, some approaches employ hierarchical semantic information such as a taxonomic rank [30], the semantic distance of WordNet [31], and text [15,17].…”
Section: Fine-grained Classificationmentioning
confidence: 99%