The task of generating natural language descriptions from images has received a lot of attention in recent years. Consequently, it is becoming increasingly important to evaluate such image captioning approaches in an automatic manner. In this paper, we provide an in-depth evaluation of the existing image captioning metrics through a series of carefully designed experiments. Moreover, we explore the utilization of the recently proposed Word Mover's Distance (WMD) document metric for the purpose of image captioning. Our findings outline the differences and/or similarities between metrics and their relative robustness by means of extensive correlation, accuracy and distraction based evaluations. Our results also demonstrate that WMD provides strong advantages over other metrics.
In the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, we propose to integrate an object-based semantic image representation into a deep features-based retrieval framework to select the relevant images. Moreover, we present a novel phrase selection paradigm and a sentence generation model which depends on a joint analysis of salient regions in the input and retrieved images within a clustering framework. We demonstrate the effectiveness of our proposed approach on Flickr8K and Flickr30K benchmark datasets and show that our model gives highly competitive results compared to the state-of-the-art models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.