With the advent of cheap digital recording and storage devices and the rapidly increasing popularity of online social networks that make extended use of visual information, like Facebook and Instagram, image retrieval regained great attention among the researchers in the areas of image indexing and retrieval. Image retrieval methods are mainly falling into content-based and text-based frameworks.Although content-based image retrieval has attracted large amount of research interest, the difficulties in querying by an example propel ultimate users towards text queries. Searching by text queries yields more effective and accurate results that meet the needs of the users while at the same time preserves their familiarity with the way traditional search engines operate. However, text-based image retrieval requires images to be annotated i.e. they are related to text information. Much effort has been invested on automatic image annotation methods [1], since the manual assignment of keywords (which is necessary for text-based image retrieval) is a time consuming and labour intensive procedure [2].In automatic image annotation, a manually annotated set of data is used to train a system for the identification of joint or conditional probability of an annotation occurring together with a certain distribution of feature vectors corresponding to image content [3]. Different models and machine learning techniques were developed to learn the correlation between image features and textual words based on examples of annotated images. Learned models of this correlation are then applied to predict keywords for unseen images [4].In the literature of automatic semantic image annotation, proposed approaches tend to classify images using only abstract terms or using holistic image features for both abstract terms and object classes. The extraction and selection of low-level features, either holistic or from particular image areas is of primary importance for automatic image annotation. This is true either for the content-based or for the text-based retrieval paradigm. In the former case the use of appropriate low-level features leads to accurate and effective object class models used in object detection while in the latter case, the better the low-level features are, the easier the learning of keyword models is.The intent of the image classification is to categorize the content of the input image to one of several keyword classes. A proper image annotation may contain more than one keyword that is relevant to the image content, so a reclassification process is required in this case, as well as whenever a new keyword class is added to the classification scheme. The creation of separate visual models for all keyword classes adds a significant value Correspondence to: zenonas.theodosiou@cut.ac.cyRecommended for acceptance by Jorge Bernal