Recent studies in image retrieval task have shown that ensembling different models and combining multiple global descriptors lead to performance improvement. However, training different models for the ensemble is not only difficult but also inefficient with respect to time and memory. In this paper, we propose a novel framework that exploits multiple global descriptors to get an ensemble effect while it can be trained in an end-to-end manner. The proposed framework is flexible and expandable by the global descriptor, CNN backbone, loss, and dataset. Moreover, we investigate the effectiveness of combining multiple global descriptors with quantitative and qualitative analysis. Our extensive experiments show that the combined descriptor outperforms a single global descriptor, as it can utilize different types of feature properties. In the benchmark evaluation, the proposed framework achieves the state-of-theart performance on the CARS196, CUB200-2011, In-shop Clothes, and Stanford Online Products on image retrieval tasks. Our model implementations and pretrained models are publicly available 1 .
Many studies have been performed on metric learning, which has become a key ingredient in top-performing methods of instance-level image retrieval. Meanwhile, less attention has been paid to pre-processing and post-processing tricks that can significantly boost performance. Furthermore, we found that most previous studies used small scale datasets to simplify processing. Because the behavior of a feature representation in a deep learning model depends on both domain and data, it is important to understand how model behave in large-scale environments when a proper combination of retrieval tricks is used. In this paper, we extensively analyze the effect of well-known pre-processing, post-processing tricks, and their combination for largescale image retrieval. We found that proper use of these tricks can significantly improve model performance without necessitating complex architecture or introducing loss, as confirmed by achieving a competitive result on the Google Landmark Retrieval Challenge 2019.
NAVER/LINE Vision † , Search Solution Vision ‡ {arnilone 1 , kobiso62 6 , hyongkuen63 7 , jglee0206 8 , shine0624 10 }@gmail.com {heejae.jun 2 , insik.kim 3 , jongtack.kim 4 , kim.youngjoon 5 , leee.sangwon 9 }@navercorp.com Detection Model Query Image × m Weighted Boxes Fusion (WBF) Bounding Boxes Retrieval Model × n Embeddings Feature Concatenation k-NN Search Retrieval Results k-reciprocal Re-ranking rank-1 rank-2 rank-3 rank-4 Final Results rank-1 rank-2 rank-3 rank-4 PCA Whitening well-performed ill-performed Post-processings QE & DBA Figure 1: The proposed pipeline for real-world clothes retrieval that consists of three components: detection, retrieval and post-processing. Blue boxes indicate the post-processing methods that performed well while red is for those that did not.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.