Image Captioning (IC) is one of the most widely discussed topic in Artificial Intelligence. In this paper, Myanmar image caption is generated using EfficientNetB7 and Bidirectional Long Short-Term Memory (Bi-LSTM) with GloVe embedding and described about the comparative analysis results. For the purpose of achieving better performance, Myanmar image caption corpus is created and annotated over 50k sentences for 10k images, which are based on Flickr8k dataset and 2k images are selected from Flickr30k dataset. Two different types of segmentations such as word and syllable segmentation level are studied in text pre-processing step and then constructed our own GloVe vectors for both segmentations. As far as being aware and up to our knowledge, this is the first attempt of applying syllable and word vector features in neural network-based Myanmar IC system and then compared with one-hot encoding vectors on various different models. According to the evaluations results, EfficientNetB7 with Bi-LSTM using word and syllable GloVe embedding outperforms than EfficientNetB7 and Bi-LSTM with one-hot encoding, other neural networks such as Gated Recurrent Unit (GRU), Bidirectional Gated Recurrent Unit (Bi-GRU), and Long Short-Term Memory (LSTM), VGG16 with Bi-LSTM, NASNetLarge with Bi-LSTM models as well as baseline models. EffecientNetB7 with Bi-LSTM using GloVe vectors achieved the highest BLEU-4 score of 35.09%, 49.52% of ROUGE-L, 54.34% of ROUGE-SU4 and 21.3% of METEOR score on word vectors, and the highest BLEU-4 score of 46.2%, 65.62% of ROUGE-L, 68.43% of ROUGE-SU4 and 27.07% of METEOR score on syllable vectors.