Temporal segmentation and keyframe selection methods for user-generated video search-based annotation. Expert Systems with Applications, 42(1), 488502.
Content-based image representation is a very challenging task if we restrict to their visual content. However, associated metadata (such as tags or geolocation) become a valuable source of complementary information that may help to enhance the current system performance. In this paper, we propose an automatic training framework that uses both image visual contents and metadata to fine tune deep Convolutional Neural Networks (CNNs), providing better image descriptors adapted to certain locations, such as cities or regions. Specifically, we propose to estimate some weak labels by combining visual-and location-related information and incorporate them to a novel loss-function over pairs of images. Our experiments on a landmark discovery task show that this novel training procedure enhances the performance up to a 55% over well-established CNNbased models and is free from overfitting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.