Existing image captioning methods just focus on understanding the relationship between objects or instances in a single image, without exploring the contextual correlation existed among contextual image. In this paper, we propose Dual Graph Convolutional Networks (Dual-GCN) with transformer and curriculum learning for image captioning. In particular, we not only use an object-level GCN to capture the object to object spatial relation within a single image, but also adopt an image-level GCN to capture the feature information provided by similar images. With the well-designed Dual-GCN, we can make the linguistic transformer better understand the relationship between different objects in a single image and make full use of similar images as auxiliary information to generate a reasonable caption description for a single image. Meanwhile, with a cross-review strategy introduced to determine difficulty levels, we adopt curriculum learning as the training strategy to increase the robustness and generalization of our proposed model. We conduct extensive experiments on the large-scale MS COCO dataset, and the experimental results powerfully demonstrate that our proposed method outperforms recent state-of-the-art approaches. It achieves a BLEU-1 score of 82.2 and a BLEU-2 score of 67.6. Our source code is available at https:// github.com/ Unbear430/ DGCN-for-image-captioning.
CCS CONCEPTS• Computing methodologies → Natural language processing; Scene understanding.
With the rapid development of the Internet and the movie industry, online reviews have shown explosive growth. Online review data contains a lot of valuable information, through text mining techniques such as the LDA topic model to analyze this information, to help the movie creators to understand the public’s viewing needs, movie production team to reflect on the shortcomings in the production process, in order to promote the development of the movie industry to provide a reference. In this paper, the Douban online review data of the MCU are used as the research object, and the Chinese text is analyzed based on the LDA topic model. Firstly, data collection and preprocessing are carried out with various software. Then, the word cloud is used to visualize the core information in the online review data, and then the LDA topic model is used for a deeper semantic mining. The research results show that the rich and varied plot design, the audiovisual feast brought by advanced digital technology, the clever connection of bonus scene, the unique characters and actors and a super-hero’s commercial movie mode are closely related to the success of the MCU.
The aim of the work is to develop an algorithm functioning by a face recognition system using object-oriented databases. The system provides automatic identification of the desired object or identifies someone using a digital photo or video frame from a video source. The technology includes comparing pre-scanned face elements from the resulting image with prototypes of faces stored in the database. Modern packages of object-oriented databases give the user the opportunity to create a new class with the specified attributes and methods, obtain classes that inherit attributes and methods from super classes, create instances of the class, each of which has a unique object identifier, extract these instances one by one or in groups, and also download and perform these procedures. Using a convolutional neural network in the algorithm allows the transition from specific features of the image to more abstract details.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.