Popular image retrieval schemes generally rely only on a single mode, (either low level visual features or embedded text) for searching in multimedia databases. Many popular image collections (eg. those emerging over Internet) have associated tags, often for human consumption. A natural extension is to combine information from multiple modes for enhancing effectiveness in retrieval. In this paper, we propose two techniques: Multi-modal Latent Semantic Indexing (MMLSI) and Multi-Modal Probabilistic Latent Semantic Analysis (MMpLSA). These methods are obtained by directly extending their traditional single mode counter parts. Both these methods incorporate visual features and tags by generating simultaneous semantic contexts. The experimental results demonstrate an improved accuracy over other single and multi-modal methods.
Most of the traditional image retrieval methods use either low level visual features or embedded text for representation and indexing. In recent years, there has been significant interest in combining these two different modalities for effective retrieval. In this paper, we propose a tri-partite graph based representation of the multi model data for image retrieval tasks. Our representation is ideally suited for dynamically changing or evolving datasets, where repeated semantic indexing is practically impossible. We employ a graph partitioning algorithm for retrieving semantically relevant images from the database of images represented using the tripartite graph. Being "just in time semantic indexing", our method is computationally light and less resource intensive. Experimental results show that the data structure used is scalable. We also show that the performance of our method is comparable with other multi model approaches, with significantly lower computational and resources requirements.
Many of the successful multimedia retrieval systems focus on developing efficient and effective video retrieval solutions with the help of appropriate index structures. In these systems, the query is an example video and the retrieved results are similar video clips which are available apriori in the database. In this paper, we address a complementary problem of filtering a video stream based on a set of given examples. By filtering, we mean to detect, accept or reject the part of a video stream matching any of the given examples. This requires matching of example videos with the on-line video stream. Since the concepts of interest could be complex, we avoid explicit learning of a representation from the example videos to characterize the visual event present in the examples. We model the problem as simultaneous on-line spotting of multiple examples in a video stream. We employ a vocabulary trie for the filtering purpose and demonstrate the applicability of the technique in a variety of situations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.