The visual storytelling (VST) task aims at generating a reasonable and coherent paragraph-level story with the image stream as input. Different from caption that is a direct and literal description of image content, the story in the VST task tends to contain plenty of imaginary concepts that do not appear in the image. This requires the AI agent to reason and associate with the imaginary concepts based on implicit commonsense knowledge to generate a reasonable story describing the image stream. Therefore, in this work, we present a commonsense-driven generative model, which aims to introduce crucial commonsense from the external knowledge base for visual storytelling. Our approach first extracts a set of candidate knowledge graphs from the knowledge base. Then, an elaborately designed vision-aware directional encoding schema is adopted to effectively integrate the most informative commonsense. Besides, we strive to maximize the semantic similarity within the output during decoding to enhance the coherence of the generated text. Results show that our approach can outperform the state-of-the-art systems by a large margin, which achieves a 29\% relative improvement of CIDEr score. With additional commonsense and semantic-relevance based objective, the generated stories are more diverse and coherent.
As a highly endangered species, the giant panda (panda) has attracted significant attention in the past decades. Considerable efforts have been put on panda conservation and reproduction, offering the promising outcome of maintaining the population size of pandas. To evaluate the effectiveness of conservation and management strategies, recognizing individual pandas is critical. However, it remains a challenging task because the existing methods, such as traditional tracking method, discrimination method based on footprint identification, and molecular biology method, are invasive, inaccurate, expensive, or challenging to perform. The advances of imaging technologies have led to the wide applications of digital images and videos in panda conservation and management, which makes it possible for individual panda recognition in a noninvasive manner by using image‐based panda face recognition method.
In recent years, deep learning has achieved great success in the field of computer vision and pattern recognition. For panda face recognition, a fully automatic deep learning algorithm which consists of a sequence of deep neural networks (DNNs) used for panda face detection, segmentation, alignment, and identity prediction is developed in this study. To develop and evaluate the algorithm, the largest panda image dataset containing 6,441 images from 218 different pandas, which is 39.78% of captive pandas in the world, is established.
The algorithm achieved 96.27% accuracy in panda recognition and 100% accuracy in detection.
This study shows that panda faces can be used for panda recognition. It enables the use of the cameras installed in their habitat for monitoring their population and behavior. This noninvasive approach is much more cost‐effective than the approaches used in the previous panda surveys.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.