Proceedings of International Conference on Multimedia Retrieval 2014
DOI: 10.1145/2578726.2578760
|View full text |Cite
|
Sign up to set email alerts
|

When textual and visual information join forces for multimedia retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 44 publications
(18 citation statements)
references
References 16 publications
0
18
0
Order By: Relevance
“…Secondly, we use the cross-media fusion [5] of three modalities and thirdly the random-walk approach of [12]. Fourth baseline method is the non-linear fusion [22] of all modalities and finally we compare our framework with the extension of the unifying fusion framework of [2] in the case of three modalities [9] in two cases: first with the SIFT visual descriptors and second with the state-of-the-art DCNN visual features. Our proposed framework combines SIFT with DCNN using PLS Regression, using non-linear graph-based fusion of all three modalities.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Secondly, we use the cross-media fusion [5] of three modalities and thirdly the random-walk approach of [12]. Fourth baseline method is the non-linear fusion [22] of all modalities and finally we compare our framework with the extension of the unifying fusion framework of [2] in the case of three modalities [9] in two cases: first with the SIFT visual descriptors and second with the state-of-the-art DCNN visual features. Our proposed framework combines SIFT with DCNN using PLS Regression, using non-linear graph-based fusion of all three modalities.…”
Section: Resultsmentioning
confidence: 99%
“…The aforementioned combination process is known as multimodal fusion. An example of a study investigating multimodal fusion is the work of [22], in which a framework for video retrieval is presented. This framework extends conventional text-based search by fusing textual and visual similarity scores in a simple non-linear way.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A number of studies have been proposed to tackle this problem on using several training examples (typically 10 or 100 examples) [14,9,38,11,31,19,34,3,36]. Generally, in a state-of-the-art system, the event classifiers are trained by low-level and high-level features, and the final decision is derived from the fusion of the individual classification results.…”
Section: Related Workmentioning
confidence: 99%
“…Scene [8] (e.g., text, graphics drawings or images) and to use it for various applications [9,10] (e.g., multimedia search, retrieval or recommendation). An image is a visual representation of things, which is more intuitive than text.…”
Section: Comparison Results For Outlier Detection (60% Outliers) On Umentioning
confidence: 99%