Multi modal semantic indexing for image retrieval

Pulla, Chandrika; Jawahar, C. V.

doi:10.1145/1816041.1816091

Cited by 30 publications

(22 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specifically, we describe two direct multi-modal approaches, Multi-modal Latent Semantic Indexing (MMLSI), and Multi-modal Probabilistic Latent Semantic Analysis (MM pLSA). These recent methods [3] extend the traditional semantic analysis schemes with the help of a tensorial representation. In MMLSI the data is represented by a 3-order tensor where the first dimension is text words, second is visual words and the third is the images.…”

Section: Direct Multimodal Semantic Indexingmentioning

confidence: 99%

“…Single mode ('visual' as well as 'tag') methods are compared against multimodal semantic indexing scheme(concat [13] as proposed in [14]). The tensorial methods proposed in [3] is superior to the single mode counterparts as well as other possible multimodal semantic indexing schemes.…”

Section: Inputmentioning

confidence: 99%

“…There are also attempts for extending them to multimodal data [3,11,14]. However, all of them require complex mathematical computations involving large matrices.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Tripartie Graph Models for Multi Modal Retrieval

Pulla¹,

Jawahar

2010

Procedings of the British Machine Vision Conference 2010

Self Cite

View full text Add to dashboard Cite

Most of the traditional image retrieval methods use either low level visual features or embedded text for representation and indexing. In recent years, there has been significant interest in combining these two different modalities for effective retrieval. In this paper, we propose a tri-partite graph based representation of the multi model data for image retrieval tasks. Our representation is ideally suited for dynamically changing or evolving datasets, where repeated semantic indexing is practically impossible. We employ a graph partitioning algorithm for retrieving semantically relevant images from the database of images represented using the tripartite graph. Being "just in time semantic indexing", our method is computationally light and less resource intensive. Experimental results show that the data structure used is scalable. We also show that the performance of our method is comparable with other multi model approaches, with significantly lower computational and resources requirements.

show abstract

Section: Direct Multimodal Semantic Indexingmentioning

confidence: 99%

Section: Inputmentioning

confidence: 99%

See 1 more Smart Citation

Tripartie Graph Models for Multi Modal Retrieval

Pulla¹,

Jawahar

2010

Procedings of the British Machine Vision Conference 2010

Self Cite

View full text Add to dashboard Cite

show abstract

“…Existing studies on multimodal image indexing and retrieval typically focus on techniques that can either identify a latent feature space for the image representations by fusing the multimodal feature representations, such as the Latent Semantic Indexing (LSI) [2,3], probabilistic Latent Semantic Analysis (pLSA) [10,3], and Non-negative Matrix Factorization (NMF) [1], or infer the associations among the multimodal features in order to generate a new representation for each image [7,20,9]. However, several limitations of such approaches have been identified.…”

Section: Introductionmentioning

confidence: 99%

Online Multimodal Co-indexing and Retrieval of Weakly Labeled Web Image Collections

Meng

Tan

Leung

et al. 2015

Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

Weak supervisory information of web images, such as captions, tags, and descriptions, make it possible to better understand images at the semantic level. In this paper, we propose a novel online multimodal co-indexing algorithm based on Adaptive Resonance Theory, named OMC-ART, for the automatic co-indexing and retrieval of images using their multimodal information. Compared with existing studies, OMC-ART has several distinct characteristics. First, OMC-ART is able to perform online learning of sequential data. Second, OMC-ART builds a two-layer indexing structure, in which the first layer co-indexes the images by the key visual and textual features based on the generalized distributions of clusters they belong to; while in the second layer, images are co-indexed by their own feature distributions. Third, OMC-ART enables flexible multimodal search by using either visual features, keywords, or a combination of both. Fourth, OMC-ART employs a ranking algorithm that does not need to go through the whole indexing system when only a limited number of images need to be retrieved. Experiments on two published data sets demonstrate the efficiency and effectiveness of our proposed approach.

show abstract

“…Barnard et al [4] propose a translation model and a hierarchical model to represent the relationship between text and content. Several studies have attempted to use LSA technique for combining visual and textual features, including [12], [20] [ [15] and [6] who apply Probabilistic Latent Semantic Analysis for automatic image annotation or image retrieval. In the transformation model [16], text query is converted automatically into visual representations for image retrieval.…”

Section: Introductionmentioning

confidence: 99%

Multi-modal and Cross-Modal for Lecture Videos Retrieval

Nguyen

Coustaty

Ogier

2014

2014 22nd International Conference on Pattern Recognition

View full text Add to dashboard Cite

International audienceThe problem of multi-modal and cross-modal lecture videos retrieval is studied in this paper, on the basis of the use of document analysis techniques. In the context of this paper, a lecture video is represented by a set of subjects, in which a subject is represented by a Bag of mixed words -visual words and textual words-, each of them coming from speech recognition and OCR engines. Our work relies on two assumptions 1) a video may contain multiple subjects, 2) multiple modalities exist in the same lecture video document. We propose in this research a combination of technologies issuing from image document analysis and text mining. Visual words and textual words in images of lecture slides are extracted based on text detection and graphics localization computed on the sequences captured with a camera. Assuming that a subject in the video composes of a set of slides, lecture slides are clustered in different groups representing different possible subjects by using mixed words extracted. Multimodal and cross-modal lecture video retrieval are realized by the Bag of Subjects model. We discuss the proposed indexing and retrieval approach for lecture videos and report a quantitative evaluation on lecture videos of our University. It is shown that using Bag of Subjects for lecture video retrieval improves the retrieval accuracy

show abstract

Multi modal semantic indexing for image retrieval

Cited by 30 publications

References 31 publications

Tripartie Graph Models for Multi Modal Retrieval

Tripartie Graph Models for Multi Modal Retrieval

Online Multimodal Co-indexing and Retrieval of Weakly Labeled Web Image Collections

Multi-modal and Cross-Modal for Lecture Videos Retrieval

Contact Info

Product

Resources

About