Every day, websites and personal archives generate an increasing number of photographs. The extent of these archives is unfathomable. The ease of usage of these enormous digital image collections contributes to their popularity. However, not all of these databases provide appropriate indexing data. As a result, it's tough to find information that the user is interested in. Thus, in order to find information about an image, it is necessary to classify its content in a meaningful way. Image annotation is one of the most difficult issues in computer vision and multimedia research. The objective is to convert an image into a single or numerous labels. This necessitates a grasp of the visual content of an image. The necessity for unambiguous information to build semantic-level concepts from raw image pixels is one of the challenges of image annotation. Unlike text annotation, where a dictionary links words to their meaning, raw picture pixels are insufficient to construct semantic-level notions directly. A simple syntax, on the other hand, is well specified for combining letters to form words and words to form sentences. The automatic feature extraction for automatic annotation was the emphasis of this paper. And they employed a deep learning convolutional neural network to build and improve image coding and annotation capabilities. Performance of the suggested technique on the Corel-5K, ESP-Game, and IAPRTC-12 datasets. Finally, experimental findings on three data sets were used to demonstrate the usefulness of this model for image annotation.