Image captioning is a process of generating textual descriptions of images. In recent years, research on publicly available large-scale datasets and deep learning-based algorithms has promoted the development of this field. However, little research has been conducted on captioning images of drug-related paraphernalia that, despite being an important topic for both drug prevention and police enforcement, is not covered by existing image captioning studies. In this paper, we propose DrunaliaCap-a deep learningbased system for autogenerating both "factual" (what is in the image) and "functional" (the usage of each paraphernalia during drug-taking) descriptions of images of drug-related paraphernalia. We constructed a new dataset containing 20 categories of drug-related items and trained deep learning-based models for the proposed system. We further proposed a method to evaluate and optimize the generation of captions to prevent them from missing important knowledge. Experiments were conducted to validate the performance of the newly proposed dataset and method. We analyzed the experimental results and discussed the significance, limitations, and potential applications of our work. INDEX TERMS image captioning, drug prevention, dataset construction, deep learning VOLUME 4, 2016 This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.