Image captioning is one of the biggest challenges in the fields of computer vision and natural language processing. Many other studies have raised the topic of image captioning. However, the evaluation results from other studies are still low. Thus, this study focuses on improving the evaluation results from previous studies. In this study, we used the Flickr8k dataset and the VGG16 Convolutional Neural Networks (CNN) model as an encoder to generate feature extraction from images. Recurrent Neural Network (RNN) uses the Bidirectional Long-Short Term Memory (BiLSTM) method as a decoder. The results of the image feature extraction process in the form of feature vectors are then forwarded to Bidirectional LSTM to produce descriptions that match the input image or visual content. The captions provide information on the object’s name, location, color, size, features of an object, and surroundings. A greedy Search algorithm with Argmax function and Beam-Search algorithm are used to calculate Bilingual Evaluation Understudy (BLEU) scores. The results of the evaluation of the best BLEU scores obtained from this study are the VGG16 model with Bidirectional LSTM using Beam Search with parameter K = 3 and the BLEU-1 score is 0.60593, so this score is superior to previous studies.