International audienceIn this paper, we consider the problem of automatically detecting a large number of visual concepts in images or video shots. State of the art systems generally involve feature (descriptor) extraction, classification (supervised learning) and fusion when several descriptors and/or classifiers are used. Though direct multi-label approaches are considered in some works, detection scores are often computed independently for each target concept. We propose a method that we call "conceptual feedback" which implicitly takes into account the relations between concepts to improve the overall concepts detection performance. A conceptual descriptor is built from the system's output scores and fed back by adding it to the pool of already available descriptors. Our proposal can be iterated several times. Moreover, we propose three extensions of our method. Firstly, a weighting of the conceptual dimensions is performed to give more importance to concepts which are more correlated to the target concept. Secondly, an explicit selection of a set of concepts that are semantically or statically related to the target concept is introduced. For video indexing, we propose a third extension which integrates the temporal dimension in the feedback process by taking into account simultaneously the conceptual and the temporal dimensions to build the high-level descriptor. Our proposals have been evaluated in the context of the TRECVid 2012 semantic indexing task involving the detection of 346 visual or multi-modal concepts. Overall, combined with temporal re-scoring, the proposed method increased the global system performance (MAP) from 0.2613 to 0.3082 (+ 17.9 % of relative improvement) while the temporal re-scoring alone increased it only from 0.2613 to 0.2691 (+ 3.0 %)
Context helps to understand the meaning of a word and allows the disambiguation of polysemic terms. Many researches took advantage of this notion in information retrieval. For concept-based video indexing and retrieval, this idea seems a priori valid. One of the major problems is then to provide a definition of the context and to choose the most appropriate methods for using it. Two kinds of contexts were exploited in the past to improve concepts detection: in some works, inter-concepts relations are used as semantic context, where other approaches use the temporal features of videos to improve concepts detection. Results of these works showed that the "temporal" and the "semantic" contexts can improve concept detection. In this work we use the semantic context through an ontology and exploit the efficiency of the temporal context in a "two-layers" re-ranking approach. Experiments conducted on TRECVID 2010 data show that the proposed approach always improves over initial results obtained using either MSVM or KNN classifiers or their late fusion, achieving relative gains between 9% and 33% of the MAP measure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.