2013
DOI: 10.1007/s00138-013-0525-x
|View full text |Cite
|
Sign up to set email alerts
|

Multimedia event detection with multimodal feature fusion and temporal concept localization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
22
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 47 publications
(22 citation statements)
references
References 45 publications
0
22
0
Order By: Relevance
“…When combined with a linear SVM, excellent results on the leading NIST TRECVID event detection benchmarks [54] are reported for scenarios where many and few examples are available. The CNN video representation outperforms more traditional video encodings such as improved dense trajectories [71], [72] and multimedia representations combining appearance, motion and audio features [49], [52], [81]. However, both the learned and engineered representations are incapable, nor intended to, recognize events when examples are completely absent.…”
Section: Introductionmentioning
confidence: 99%
“…When combined with a linear SVM, excellent results on the leading NIST TRECVID event detection benchmarks [54] are reported for scenarios where many and few examples are available. The CNN video representation outperforms more traditional video encodings such as improved dense trajectories [71], [72] and multimedia representations combining appearance, motion and audio features [49], [52], [81]. However, both the learned and engineered representations are incapable, nor intended to, recognize events when examples are completely absent.…”
Section: Introductionmentioning
confidence: 99%
“…from the text search [9,15,23,6] or the visual search [28,5]. Due to the challenge of multimedia retrieval, features from multiple modalities are usually used to achieve better performance [20,8,24]. However, performing PRF on multimodal tasks such as event search is an important yet unaddressed problem.…”
Section: Introductionmentioning
confidence: 99%
“…Wang et al [38] discussed a notable system in TRECVID 2012 that is characterized by applying feature selection over so-called motion relativity features. Oh et al [31] presented a latent SVM event detector that enables for temporal evidence localization. Jiang et al [19] presented an efficient method to learn "optimal" spatial event representations from data.…”
Section: Related Workmentioning
confidence: 99%
“…A number of studies have been proposed to tackle this problem on using several training examples (typically 10 or 100 examples) [14,9,38,11,31,19,34,3,36]. Generally, in a state-of-the-art system, the event classifiers are trained by low-level and high-level features, and the final decision is derived from the fusion of the individual classification results.…”
Section: Related Workmentioning
confidence: 99%