2017
DOI: 10.1109/tcyb.2016.2539546
|View full text |Cite
|
Sign up to set email alerts
|

Bi-Level Semantic Representation Analysis for Multimedia Event Detection

Abstract: Multimedia event detection has been one of the major endeavors in video event analysis. A variety of approaches have been proposed recently to tackle this problem. Among others, using semantic representation has been accredited for its promising performance and desirable ability for human-understandable reasoning. To generate semantic representation, we usually utilize several external image/video archives and apply the concept detectors trained on them to the event videos. Due to the intrinsic difference of t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
44
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 220 publications
(44 citation statements)
references
References 35 publications
0
44
0
Order By: Relevance
“…Most of the traffic sign detection methods are focused on images. Another focus of study can be to analyze traffic videos for traffic sign detection by leveraging the semantic representations [20,21]. Yet another focus could consider mining the correlations between the features of traffic signs by a semi-supervised feature selection framework [22] in traffic videos.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Most of the traffic sign detection methods are focused on images. Another focus of study can be to analyze traffic videos for traffic sign detection by leveraging the semantic representations [20,21]. Yet another focus could consider mining the correlations between the features of traffic signs by a semi-supervised feature selection framework [22] in traffic videos.…”
Section: Related Workmentioning
confidence: 99%
“…The recall rate was inferior to other methods. [13] 92.15% 89.17% -CNN(Zhu) [13,22] 94% 91% -ConvNet [20] 96.49% 99.89% 0.027…”
Section: Evaluations On Gtsdbmentioning
confidence: 99%
“…We obtained raw Digital Imaging and Communications in Medicine (DICOM) MRI scans from the public ADNI website, where these MRI scans have been reviewed for quality, and automatically corrected for spatial distortion caused by gradient nonlinearity and B1 field inhomogeneity. We then processed all MR images following the same procedures in [38], [39] as detailed below:…”
Section: Materials and Data Preprocessingmentioning
confidence: 99%
“…In the past decades, a large number of efforts have been devoted to revealing the inter-modal correspondence via learning a shared embedding space for cross-modal similarity measurement Kang et al (2015a); Wang et al (2015); Jin et al (2015); Menon et al (2015); Irie et al (2015); Chang et al (2017b). For example, Canonical Correlation Analysis (CCA) and its extensions to kernel version Hotelling (1936); Hardoon et al (2004) aim to learn a common representation by mutually maximizing the correlation between their projections onto the shared basis vectors; Latent Dirichlet Allocation (LDA) based methods ; Barnard et al (2003); Wang et al (2009) ;Jia et al (2011); Xiaojun Chang and Hauptmann (2017) establish the shared latent semantic model through the joint distribution of images and the corresponding annotations as well as the conditional relationships between them.…”
Section: Arxiv:170201229v2 [Cslg] 7 Jul 2017mentioning
confidence: 99%