This paper proposes a novel representation space for multimodal information, enabling fast and efficient retrieval of video data. We suggest describing the documents not directly by selected multimodal features (audio, visual or text), but rather by considering cross-document similarities relatively to their multimodal characteristics. This idea leads us to propose a particular form of dissimilarity space that is adapted to the asymmetric classification problem, and in turn to the query-by-example and relevance feedback paradigm, widely used in information retrieval. Based on the proposed dissimilarity space, we then define various strategies to fuse modalities through a kernel-based learning approach. The problem of automatic kernel setting to adapt the learning process to the queries is also discussed. The properties of our strategies are studied and validated on artificial data. In a second phase, a large annotated video corpus, (ie TRECVID-05), indexed by visual, audio and text features is considered to evaluate the overall performance of the dissimilarity space and fusion strategies. The obtained results confirm the validity of the proposed approach for the representation and retrieval of multimodal information in a real-time framework.
Abstract. Different strategies to learn user semantic queries from dissimilarity representations of video audio-visual content are presented. When dealing with large corpora of videos documents, using a feature representation requires the online computation of distances between all documents and a query. Hence, a dissimilarity representation may be preferred because its offline computation speeds up the retrieval process. We show how distances related to visual and audio video features can directly be used to learn complex concepts from a set of positive and negative examples provided by the user. Based on the idea of dissimilarity spaces, we derive three algorithms to fuse modalities and therefore to enhance the precision of retrieval results. The evaluation of our technique is performed on artificial data and on the complete annotated TRECVID corpus.
Abstract. We propose an original bayesian approach to recognize human behaviors from video streams. Mobile objects and their visual features are computed by a vision module. Then, using a Recurrent Bayesian Network, behaviors of the mobile objects are recognized through the temporal evolution of their visual features.
Video document retrieval is now an active part of the domain of multimedia retrieval. However, unlike for other media, the management of a collection of video documents adds the problem of efficiently handling an overwhelming volume of temporal data. Challenges include balancing efficient content modeling and storage against fast access at various levels. In this paper, we detail the framework we have built to accommodate our developments in content-based multimedia retrieval. We show that not only our framework facilitates the developments of processing and indexing algorithms but it also opens the way to several other possibilities such as rapid interface prototyping or retrieval algorithms benchmarking. In this respect, we discuss our developments in relation to wider contexts such as MPEG-7 and the TREC Video Track. MOTIVATIONSVideo data processing has for long been of high interest for the development of compression and efficient transmission algorithms. In parallel, the domain of content-based multimedia retrieval has developed, initially from text retrieval, then for images and now addressing video content retrieval. Whereas in text and image retrieval the volume of data and associated access techniques are well under control, this is largely not the case for video collection management. Not only video data volume may rapidly grow complex and huge but it also requires efficient access techniques associated to the temporal aspect of the data.Efforts in video content modeling such as MPEG-7 [9] are providing a base for the solution to the problem of handling large amount of multimedia data. While such a model is very well suited to represent a single multimedia document, it cannot be used efficiently for accessing, querying and managing a large collection of such documents due to its inherent complexity. Unfortunately, most video retrieval systems presented in state of the art literature [1, 4] do not explicitely discuss the way they address such management Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CVDB '04 Paris, France Copyright 2004 ACM 1-58113-917-9/04/06 ...$5.00. issues.In this paper, we detail the framework we have constructed for the management of video document collections in the context of our research in video content analysis. Rather than presenting a temporal document model alone, our ultimate goal is to develop content characterization and indexing algorithms for the management of large video collections. When addressing such problems, one rapidly faces the need for a favorable context on which to base these developments and also that permits rapid and objective evaluation of research findings. From an extensible multimedia...
This paper deals with the problem of event discrimination in generic video documents. We propose an investigation on the design of an activity-based similarity measure derived from motion analysis. In an unsupervised context, our approach relies on the nonlinear temporal modeling of wavelet-based motion features directly estimated from the video frame.Based on SVM-regression, this nonlinear model is able to learn the behavior of the motion descriptors along the temporal dimension and to capture useful information about the dynamic content of the shot. A similarity measure shows that our approach is effective in discriminating between video events without any prior knowledge.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.