Proceedings of the 12th Annual ACM International Conference on Multimedia 2004
DOI: 10.1145/1027527.1027665
|View full text |Cite
|
Sign up to set email alerts
|

Optimal multimodal fusion for multimedia data analysis

Abstract: Considerable research has been devoted to utilizing multimodal features for better understanding multimedia data. However, two core research issues have not yet been adequately addressed. First, given a set of features extracted from multiple media sources (e.g., extracted from the visual, audio, and caption track of videos), how do we determine the best modalities? Second, once a set of modalities has been identified, how do we best fuse them to map to semantics? In this paper, we propose a two-step approach.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
136
0
3

Year Published

2006
2006
2010
2010

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 197 publications
(139 citation statements)
references
References 31 publications
0
136
0
3
Order By: Relevance
“…Amir et al [30] concatenated the concept prediction scores into a long vector called a model vector and stacked a support vector machine on top to learn a binary classification for each concept. An ontology-based multi-classification algorithm was proposed by Wu et al [13] which attempted to model possible influence relations between concepts based on a predefined ontology hierarchy.…”
Section: Exploiting Multiple-concept Relationshipsmentioning
confidence: 99%
“…Amir et al [30] concatenated the concept prediction scores into a long vector called a model vector and stacked a support vector machine on top to learn a binary classification for each concept. An ontology-based multi-classification algorithm was proposed by Wu et al [13] which attempted to model possible influence relations between concepts based on a predefined ontology hierarchy.…”
Section: Exploiting Multiple-concept Relationshipsmentioning
confidence: 99%
“…Combining multimedia correlations in applications leverages all available information and has led to improved performances in segmentation (Hsu et al, 2004), classification (Lin & Hauptmann, 2002;Vries, de Westerveld, & Ianeva, 2004), retrieval (Wang, Ma, Xue, & Li, 2004;Wu, Chang, Chang, & Smith, 2004;Zhang, Zhang, & Ohya, 2004), and topic detection (Duygulu, Pan, & Forsyth, 2004;Xie et al, 2005). One crucial step of fusing multimodal correlations into applications is to detect and model the correlations among different data modalities.…”
Section: Multimedia Cross-modal Correlationmentioning
confidence: 99%
“…Classifiers are useful in capturing discriminative patterns between different data modalities. To identify multimodal patterns for data classification, one can use either a multimodal classifier which takes a multimodal input, or a metaclassifier (Lin & Hauptmann, 2002;Wu et al, 2004) which takes as input the outputs of multiple unimodal classifiers.…”
Section: Multimedia Cross-modal Correlationmentioning
confidence: 99%
“…This can be seen in recently published approaches that solve similar tasks and nevertheless use different information fusion levels. Examples using classifier fusion are multimedia retrieval [28], multi-modal object recognition [12], multibiometrics [15] and video retrieval [29]. Concerning data fusion, the applications that can be named are multimedia summarization [1], text and image categorization [7], multi-modal image retrieval [27] and web document retrieval [19].…”
Section: Introductionmentioning
confidence: 99%