Multimedia Event Detection (MED) is a multimedia retrieval task with the goal of finding videos of a particular event in a large-scale Internet video archive, given example videos and text descriptions. In this paper, we mainly focus on an 'ad-hoc' scenario in MED where we do not use any example video. We aim to retrieve test videos based on their visual semantics using a Visual Concept Signature (VCS) generated for each event only derived from the event description provided as the query. Visual semantics are described using the Semantic INdexing (SIN) feature which represents the likelihood of predefined visual concepts in a video. To generate a VCS for an event, we project the given event description to a visual concept list using the proposed textual semantic similarity. Exploring SIN feature properties, we harmonize the generated visual concept signature and the SIN feature to improve retrieval performance. We conduct different experiments to assess the quality of generated visual concept signatures with respect to human expectation, and in the context of the MED task to retrieve the SIN feature of videos in the test dataset when we have no or only very few training videos.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.