Research on content-based image retrieval (CBIR) has been under development for decades, and numerous methods have been competing to extract the most discriminative features for improved representation of the image content. Recently, deep learning methods have gained attention in computer vision, including CBIR. In this paper, we present a comparative investigation of different features, including low-level and high-level features, for CBIR. We compare the performance of CBIR systems using different deep features with state-of-the-art low-level features such as SIFT, SURF, HOG, LBP, and LTP, using different dictionaries and coefficient learning techniques. Furthermore, we conduct comparisons with a set of primitive and popular features that have been used in this field, including colour histograms and Gabor features. We also investigate the discriminative power of deep features using certain similarity measures under different validation approaches. Furthermore, we investigate the effects of the dimensionality reduction of deep features on the performance of CBIR systems using principal component analysis, discrete wavelet transform, and discrete cosine transform. Unprecedentedly, the experimental results demonstrate high (95% and 93%) mean average precisions when using the VGG-16 FC7 deep features of Corel-1000 and Coil-20 datasets
IntroductionGiven a set of images S and an input image i, the goal of a content-based image retrieval (CBIR) system is to search S for i and return the most related/similar images to i, based on their contents. This emergent field responds to an urgent need to search for an image based on its content, rather than typing text to describe image content to be searched for. That is, CBIR systems allow users to conduct a query by image (QBI), and the system's task is to identify the images that are relevant to that image. Prior to CBIR, the traditional means of searching for images was typing a text describing the image content, known as query by text (QBT). However, QBT requires predefined image information, such as metadata, which necessitate human intervention to annotate images in order to describe their contents. This is unfeasible, particularly with the emergence of big data; for example, Flickr creates approximately 3.6 TB of image data, while Google deals with approximately 20,000 TB of data daily[1], which mostly comprise images and videos. Applications of CBIR are massive in terms of numbers and areas, which include, but are not limited to, medical image analysis [2], image mining[3][4][5], surveillance[6], biometrics[7], security[8][9][10], and remote sensing[11].The key to the success of a CBIR system lies in extracting features from an image to define its content. These features are stored to describe each image, which is implemented automatically by the system, using specific algorithms developed for the extraction process. Similarly, a query process is conducted by extracting the same features from the query image to determine the most similar images from a feature dataset, ...