In this paper we present the results of our work on the analysis of multi-modal data for video Information Retrieval, where we exploit the properties of this data for query-time, automatic generation of weights for multi-modal data fusion. Through empirical testing we have observed that for a given topic, a high performing feature, that is one which achieves high relevance, will have a different distribution of document scores when compared against those that do not perform as well. These observations form the basis for our initial fusion model, which generates weights based on these properties, without the need for prior training. Our model can be used to not only combine feature data, but to also combine the results of multiple example query images and apply weights to these. Our analysis and experiments were conducted on the TRECVid 2004 and 2005 collections, making use of multiple MPEG-7 low-level features and automatic speech recognition (ASR) transcripts. Results achieved from our model achieve performance on a par with that of 'oracle' determined weights, and demonstrate the applicability of our model whilst advancing the case for further investigation of score distributions.
The growth in available online video material over the Internet is generally combined with user-assigned tags or content description, which is the mechanism by which we then access such video. However, user-assigned tags have limitations for retrieval and often we want access where the content of the video itself is directly matched against a user's query rather than against some manually assigned surrogate tag. Content-based video retrieval techniques are not yet scalable enough to allow interactive searching on Internet-scale, but the techniques are proving robust and effective for smaller collections. In this article, we show three exemplar systems which demonstrate the state of the art in interactive, content-based retrieval of video shots, and these three are just three of the more than 20 systems developed for the 2007 iteration of the annual TRECVid benchmarking activity. The contribution of our article is to show that retrieving from video using content-based methods is now viable, that it works, and that there are many systems which now do this, such as the three outlined herein. These systems, and others can provide effective search on hundreds of hours of video content and are samples of the kind of contentbased search functionality we can expect to see on larger video archives when issues of scale are addressed.
In this paper, we describe one of the largest multi-site interactive video retrieval experiments conducted in a laboratory setting. Interactive video retrieval performance is difficult to cross-compare as variables exist across users, interfaces and the underlying retrieval engine. Conducted within the framework of TRECVID 2008, we completed a multi-site, multi-interface experiment. Three institutes participated involving 36 users, 12 each from Dublin City University (DCU, Ireland), University of Glasgow (GU, Scotland) and Centrum Wiskunde & Informatica (CWI, the Netherlands). Three user interfaces were developed which all used the same search service. Using a latin squares arrangement, each user completed 12 topics, leading to 6 TRECVID runs per site, 18 in total. This allowed us to isolate the factors of users and interfaces from retrieval performance. In this paper we present an analysis of both the quantitative and qualitative data generated from this experiment, demonstrating that for interactive video retrieval with "novice" users, performance can vary by up to 300% for the same system using different sets of users, whilst differences in performance of interface variants was in comparison not statistically different. Our results have implications for the manner in which interactive video retrieval experiments using non-expert users are evaluated. The primary focus of this paper is in highlighting that non-expert users generate very large performance fluctuations, which may either mask or create system variability. The discussion of why this happened is not covered by this paper.
Content-Based Multimedia Information Retrieval (CBMIR) systems which leverage multiple retrieval experts (En) often employ a weighting scheme when combining expert results through data fusion. Typically however a query will comprise multiple query images (Im) leading to potentially N × M weights to be assigned. Because of the large number of potential weights, existing approaches impose a hierarchy for data fusion, such as uniformly combining query image results from a single retrieval expert into a single list and then weighting the results of each expert. In this paper we will demonstrate that this approach is sub-optimal and leads to the poor state of CBMIR performance in benchmarking evaluations. We utilize an optimization method known as Coordinate Ascent to discover the optimal set of weights (|En| · |Im|) which demonstrates a dramatic difference between known results and the theoretical maximum. We find that imposing common combinatorial hierarchies for data fusion will half the optimal performance that can be achieved. By examining the optimal weight sets at the topic level, we observe that approximately 15% of the weights (from set |En| · |Im|) for any given query, are assigned 70%-82% of the total weight mass for that topic. Furthermore we discover that the ideal distribution of weights follows a log-normal distribution. We find that we can achieve up to 88% of the performance of fully optimized query using just these 15% of the weights. Our investigation was conducted on TRECVID evaluations 2003 to 2007 inclusive and ImageCLEFPhoto 2007, totalling 181 search topics optimized over a combined collection size of 661,213 images and 1,594 topic images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.