“…In this paper, Mainly three categories of features i.e geometric, conceptual, and visual are used for content generation.Also, variety of methods have been proposed for image caption generation in the past. They may be classified in broadly three categories i.e., Template-based methods [14,30,51,37], Retrieval-based methods [40,43,16,46,20], and Deep neural network based (Encoder-decoder) methods [49,3,15,12]. These models are often built using CNN to encode the image & extract visual information whereas RNN is used to decode the visual information into a sentence.…”