2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP) 2019
DOI: 10.1109/mmsp.2019.8901787
|View full text |Cite
|
Sign up to set email alerts
|

Deep Aggregation of Regional Convolutional Activations for Content Based Image Retrieval

Abstract: One of the key challenges of deep learning based image retrieval remains in aggregating convolutional activations into one highly representative feature vector. Ideally, this descriptor should encode semantic, spatial and low level information. Even though off-the-shelf pre-trained neural networks can already produce good representations in combination with aggregation methods, appropriate fine tuning for the task of image retrieval has shown to significantly boost retrieval performance. In this paper, we pres… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 21 publications
0
7
0
Order By: Relevance
“…For image similarity, VIRET, SOMHunter, VBS2020 Winner and CollageHunter all used embedded W2VV++ model features [31,34,40]. VIREO uses visual embeddings of the dual-task model [74], VERGE the last pooling layer of a fine-tuned GoogleNet [45], HTW a CNN with DARAC-Pooling [61] and VISIONE Resnet101-GeM [50] and TERN [39]. For color or semantic sketches, vitrivr supports a plethora of features [51,56], VERGE clusters to twelve predefined colors using the Color Layout MPEG-7 descriptor, and HTW uses a handcrafted low-level feature [18].…”
Section: Image and Sketch Searchmentioning
confidence: 99%
“…For image similarity, VIRET, SOMHunter, VBS2020 Winner and CollageHunter all used embedded W2VV++ model features [31,34,40]. VIREO uses visual embeddings of the dual-task model [74], VERGE the last pooling layer of a fine-tuned GoogleNet [45], HTW a CNN with DARAC-Pooling [61] and VISIONE Resnet101-GeM [50] and TERN [39]. For color or semantic sketches, vitrivr supports a plethora of features [51,56], VERGE clusters to twelve predefined colors using the Color Layout MPEG-7 descriptor, and HTW uses a handcrafted low-level feature [18].…”
Section: Image and Sketch Searchmentioning
confidence: 99%
“…Retrieval qualities could further be increased by training neural networks with large scale datasets such as Google Landmarks v2 [12] and the introduction of more sophisticated pooling methods. Instead of the commonly used global average pooling of the activation maps from the last convolutional layer, approaches that aggregate regional information obtained from the activation map into a single, global descriptor have been introduced in [4,13]. Furthermore a very simple but effective pooling method has been shown to produce even better results in [6].…”
Section: Related Workmentioning
confidence: 99%
“…Similar to most vision related tasks, deep learning models have taken over in the field of content-based image retrieval (CBIR) over the course of the last decade [1][2][3]. More recent research has also shown that performance of retrieval models can further be enhanced by applying a more appropriate loss objective from the metric learning family which will enforce an embedding space with denser clusters and therefore will often yield better nearest neighbor search results [2,4,5]. Other publications introduce more complex pooling functions to obtain even more discriminative image features as a replacement to the commonly used global average pooling [4,6].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In [31], systematic investigation of correlative factors to exploit deep features for image retrieval of high‐resolution remote sensing images. Deep aggregation of regional convolution activation for CBIR was done using supervised aggregation method and non‐linear approximation loss function [32].…”
Section: Introductionmentioning
confidence: 99%