This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task.We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with textonly translation systems leading us up to this choice.We have the top scoring system for both English-to-German and English-to-French, according to the automatic metrics for flickr18.Our experiments show that the effect of the visual features in our system is small. Our largest gains come from the quality of the underlying text-only NMT system. We find that appropriate use of additional data is effective.
In this paper, we introduce a violent scenes and violence-related concept detection dataset named VSD2014. It contains annotations as well as auditory and visual features of typical Hollywood movies and user-generated footage shared on the web. The dataset is the result of a joint annotation endeavor of different research institutions and responds to the real-world use case of parental guidance in selecting appropriate content for children. The dataset has been validated during the Violent Scenes Detection (VSD) task at the MediaEval benchmarking initiative for multimedia evaluation.
A significant fraction of information searches are motivated by the user's primary task. An ideal search engine would be able to use information captured from the primary task in order to proactively retrieve useful information. Previous work has shown that many information retrieval activities depend on the primary task in which the retrieved information is to be used, but fairly little research has been focusing on methods that automatically learn the informational intents from the primary task context. We study how the implicit primary task context can be used to model the user's search intent and to proactively retrieve relevant and useful information. Data comprising of logs from a user study, in which users are writing an essay, demonstrate that users' search intents can be captured from the task and relevant and useful information can be proactively retrieved. Data from simulations with several data sets of different complexity show that the proposed approach of using primary task context generalizes to a variety of data. Our findings have implications for the design of proactive search systems that can infer users' search intent implicitly by monitoring users' primary task activities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.