Putting captions on images is an important phase in image processing and machine vision which divides an image into different regions. In the related captioning operation, similar attributes are extracted from each region and a single caption is opted for the title of the input for the relevant image. In the developed world, due to the considerable volume of images that have been available on the Internet, it is complicated to choose a suitable method for captioning images automatically and determining concepts correctly. In this study, a method based on a Bidirectional LSTM neural network with 14 layers is used for automatic captioning. The dataset is trained on ImageNet by MobileNet architecture. Due to the high volume of the data set consequently with the huge calculations that are required for captioning, MobileNet architecture has been applied to decrease the computational load and to boost the performance of the proposed method. in this study, the data set is flickr8k with a volume of data of more than 1.04 GB which was selected as one of the challenging data sets for captioning images through the Kaggle site. The proposed method is implemented with Python language by artificial intelligence package TensorFlow and Cross on Google Colab. The proposed Bidirectional LSTM neural network has been compared with five other models based on the LSTM neural network. So, the results of these comparisons have been evaluated for Precision, Accuracy, Recall, F score, and Loss Function. According to the hardware's limitations and the huge volume of calculations, the proposed model was able to show convincing performance compared to other models. The obtained results with an accuracy of 75.90% indicate the improvement of the accuracy as well as the reduction of the cost in the captioning process for the proposed model.