A review on automatic image annotation techniques

Zhang, Dengsheng; Islam, Md. Monirul; Lu, Guojun

doi:10.1016/j.patcog.2011.05.013

Cited by 420 publications

(254 citation statements)

References 88 publications

Supporting

Mentioning

252

Contrasting

Unclassified

Order By: Relevance

“…Images are manually annotated and subsequently retrieved in the same fashion as text documents using a database management system. Furthermore, traditional annotation has three disadvantages: Manual annotation requires significant level of human effort, the annotation is inaccurate due to the subjectivity of human perceptiveness, in addition to the Polysemy problem which means that the same word can refer to more than one object (Markkula and Sormunen, 2000;Zhang et al, 2012). These problems drew attention to image retrieval approaches based on the content.…”

Section: Introductionmentioning

confidence: 99%

Combining SURF and MSER along with Color Features for Image Retrieval System Based on Bag of Visual Words

Elnemr

2016

Journal of Computer Science

View full text Add to dashboard Cite

Content-Based Image Retrieval (CBIR) has received an extensive attention from researchers due to the rapid growing and widespread of image databases. Despite the massive research efforts consumed for CBIR, the completely satisfactory results have not yet been attained. In this article, we offer a new CBIR technique that relies on extracting Speeded Up Robust Features (SURF) and Maximally Stable Extremal Regions (MSER) feature descriptors as well as the color features; color correlograms and Improved Color Coherence Vector (ICCV). These features are joined and used to build a multidimensional feature vector. Bag-of-Visual-Words (BoVW) technique is utilized to quantize the extracted feature vector. Then, a multiclass Support Vector Machine (SVM) is implemented to classify the query images. The performance of the presented retrieval framework is analyzed and scrutinized by comparing it with three alternative approaches. The first one is based on extracting SURF descriptors while the second one is based on extracting SURF descriptors, color correlograms and ICCV. The third approach, on the other hand, is based on extracting MSER, color correlograms and ICCV. All implemented schemes are tested on two benchmark datasets; Corel-1000 and COIL-100 datasets. The empirical results show that our suggested approach has a superior discriminative classification and retrieval performance with respect to other approaches. The proposed method achieves average precisions of 88 and 93% for the Corel-1000 and COIL-100 datasets, respectively. Moreover, the proposed system has shown a substantial advance in the retrieval precision when compared with other existing systems.

show abstract

Section: Introductionmentioning

confidence: 99%

Combining SURF and MSER along with Color Features for Image Retrieval System Based on Bag of Visual Words

Elnemr

2016

Journal of Computer Science

View full text Add to dashboard Cite

show abstract

“…However, in unconstrained CBVR the type of concepts to deal with is so wide that simpler and non-specialised descriptors are commonly used [4].…”

Section: Video Representation Spacementioning

confidence: 99%

“…Nonetheless, the so-called semantic gap [22] between computable low-level features and query concepts is still a challenge for huge unconstrained video collections. The visual variability of semantic concepts is so high that often current approaches are not able to capture properly unconstrained queries in extensive collections [4]. Therefore, new capabilities are required in CBVR to bring the video characterisation to a higher semantic level.…”

Section: Limitations Of Current Approaches and Topic Modelsmentioning

confidence: 99%

“…Content-Based Video Retrieval (CBVR) is concerned about providing users with those videos which satisfy their queries by means of the video content analysis. As a result, the CBVR field has become a very important research area and a wide variety of CBVR systems have been developed [1,2,3,4]. The standard CBVR procedure involves three main components: (i) a query, containing a few video examples of the semantic concept that the user is looking for; (ii) a database, which is used to retrieve videos related to the query concept; and (iii) a ranking function, which sorts the database according to the relevance with respect to the user's query.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Latent topics-based relevance feedback for video retrieval

Fernández-Beltrán

Pla

2016

Pattern Recognition

View full text Add to dashboard Cite

This work presents a novel Content-Based Video Retrieval approach in order to cope with the semantic gap challenge by means of latent topics.Firstly, a supervised topic model is proposed to transform the classical retrieval approach into a class discovery problem. Subsequently, a new probabilistic ranking function is deduced from that model to tackle the semantic gap between low-level features and high-level concepts. Finally, a shortterm relevance feedback scheme is defined where queries can be initialised with samples from inside or outside the database. Several retrieval simulations have been carried out using three databases and seven different ranking functions to test the performance of the presented approach. Experiments revealed that the proposed ranking function is able to provide a competitive advantage within the content-based retrieval field.

show abstract

“…they are related to text information. Much effort has been invested on automatic image annotation methods [1], since the manual assignment of keywords (which is necessary for text-based image retrieval) is a time consuming and labour intensive procedure [2].In automatic image annotation, a manually annotated set of data is used to train a system for the identification of joint or conditional probability of an annotation occurring together with a certain distribution of feature vectors corresponding to image content [3]. Different models and machine learning techniques were developed to learn the correlation between image features and textual words based on examples of annotated images.…”

mentioning

confidence: 99%

Image Retrieval: Modelling Keywords via Low-level Features

Θεοδοσίου

2015

ELCVIA

View full text Add to dashboard Cite

With the advent of cheap digital recording and storage devices and the rapidly increasing popularity of online social networks that make extended use of visual information, like Facebook and Instagram, image retrieval regained great attention among the researchers in the areas of image indexing and retrieval. Image retrieval methods are mainly falling into content-based and text-based frameworks.Although content-based image retrieval has attracted large amount of research interest, the difficulties in querying by an example propel ultimate users towards text queries. Searching by text queries yields more effective and accurate results that meet the needs of the users while at the same time preserves their familiarity with the way traditional search engines operate. However, text-based image retrieval requires images to be annotated i.e. they are related to text information. Much effort has been invested on automatic image annotation methods [1], since the manual assignment of keywords (which is necessary for text-based image retrieval) is a time consuming and labour intensive procedure [2].In automatic image annotation, a manually annotated set of data is used to train a system for the identification of joint or conditional probability of an annotation occurring together with a certain distribution of feature vectors corresponding to image content [3]. Different models and machine learning techniques were developed to learn the correlation between image features and textual words based on examples of annotated images. Learned models of this correlation are then applied to predict keywords for unseen images [4].In the literature of automatic semantic image annotation, proposed approaches tend to classify images using only abstract terms or using holistic image features for both abstract terms and object classes. The extraction and selection of low-level features, either holistic or from particular image areas is of primary importance for automatic image annotation. This is true either for the content-based or for the text-based retrieval paradigm. In the former case the use of appropriate low-level features leads to accurate and effective object class models used in object detection while in the latter case, the better the low-level features are, the easier the learning of keyword models is.The intent of the image classification is to categorize the content of the input image to one of several keyword classes. A proper image annotation may contain more than one keyword that is relevant to the image content, so a reclassification process is required in this case, as well as whenever a new keyword class is added to the classification scheme. The creation of separate visual models for all keyword classes adds a significant value Correspondence to: zenonas.theodosiou@cut.ac.cyRecommended for acceptance by Jorge Bernal

show abstract

A review on automatic image annotation techniques

Cited by 420 publications

References 88 publications

Combining SURF and MSER along with Color Features for Image Retrieval System Based on Bag of Visual Words

Combining SURF and MSER along with Color Features for Image Retrieval System Based on Bag of Visual Words

Latent topics-based relevance feedback for video retrieval

Image Retrieval: Modelling Keywords via Low-level Features

Contact Info

Product

Resources

About